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ABSTRACT 


We  present  a  novel  regression  framework  centered  on  a  coherent  and  averse  mea¬ 
sure  of  risk,  the  superquantile  risk  (also  called  conditional  value-at-risk),  which  yields 
more  conservatively  fitted  curves  than  classical  least  squares  and  quantile  regressions. 
In  contrast  to  other  generalized  regression  techniques  that  approximate  conditional 
super  quantiles  by  various  combinations  of  conditional  quantiles,  we  directly  and  in 
perfect  analog  to  classical  regression  obtain  superquantile  regression  functions  as  op¬ 
timal  solutions  of  certain  error  minimization  problems.  We  show  the  existence  and 
possible  uniqueness  of  regression  functions,  discuss  the  stability  of  regression  func¬ 
tions  under  perturbations  and  approximation  of  the  underlying  data,  and  propose  an 
extension  of  the  coefficient  of  determination  R-squared  and  Cook’s  distance  for  as¬ 
sessing  the  goodness  of  fit  for  both  quantile  and  superquantile  regression  models.  We 
present  two  classes  of  computational  methods  for  solving  the  superquantile  regression 
problem,  compare  both  methods’  complexity,  and  illustrate  the  methodology  in  eight 
numerical  examples  in  the  areas  of  military  applications,  concerning  mission  employ¬ 
ment  of  U.S.  Navy  helicopter  pilots  and  Portuguese  Navy  submariners,  reliability 
engineering,  uncertainty  quantification,  and  financial  risk  management. 


v 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


vi 


TABLE  OF  CONTENTS 


I.  INTRODUCTION .  1 

A.  MOTIVATION  AND  BACKGROUND .  1 

B.  CONNECTIONS  WITH  THE  LITERATURE .  6 

C.  SCOPE  OF  DISSERTATION .  10 

D.  CONTRIBUTIONS .  11 

E.  DISCLAIMER .  12 

F.  ORGANIZATION .  13 

II.  FOUNDATIONS  OF  SUPERQUANTILE  REGRESSION  ...  15 

A.  QUANTILES,  SUPERQUANTILES,  AND  ERRORS .  15 

1.  Definitions  and  Assumptions .  16 

2.  Overview  of  the  Fundamental  Risk  Quadrangle .  18 

3.  Quantile  Regret  and  Error  Measures .  20 

4.  Superquantile  Regret  and  Error  Measures .  22 

B.  SUPERQUANTILE  REGRESSION .  28 

1.  Superquantile  Regression  Problem .  28 

2.  Superquantile  Tracking .  43 

C.  VALIDATION  ANALYSIS .  45 

III.  COMPUTATIONAL  METHODS  .  51 

A.  PRIMAL  METHODS .  52 

1.  Analytical  Integration .  52 

2.  Numerical  Integration .  55 

B.  DUAL  METHODS .  57 

1.  Dualization  of  Risk .  58 

2.  Subgradient  Method  .  61 

3.  Coordinate  Descent  Method .  63 

4.  Cutting  Plane  Method .  63 

vii 


C.  COMPLEXITY .  65 

1.  Least  Squares  Regression .  65 

2.  Quantile  Regression .  66 

3.  Superquantile  Regression  -  Primal  Methods  .  67 

4.  Superquantile  Regression  -  Dual  Methods  .  68 

IV.  NUMERICAL  EXAMPLES .  71 

A.  COMPUTATIONAL  COST .  72 

B.  ENGEL  DATA .  82 

C.  BROWNLEE  STACK  LOSS  PLANT  DATA .  90 

D.  INVESTMENT  ANALYSIS .  95 

E.  U.S.  NAVY  HELICOPTER  PILOTS  DATA .  96 

F.  PORTUGUESE  SUBMARINERS  EFFORT  INDEX .  102 

G.  UNCERTAINTY  QUANTIFICATION .  112 

H.  SUPERQUANTILE  TRACKING .  118 

V.  SUMMARY,  CONCLUSIONS,  AND  FUTURE  WORK  ....  121 

A.  SUMMARY  AND  CONCLUSIONS .  121 

B.  FUTURE  WORK .  123 

LIST  OF  REFERENCES  . 125 

INITIAL  DISTRIBUTION  LIST  . 129 


viii 


LIST  OF  FIGURES 


Figure  1.  Scatter  plot  of  the  data  for  the  constructed  example .  2 

Figure  2.  Least  squares  regression  vs.  quantile  regression  at  a  probability 

level  a  =  0.75,  before  and  after  some  changes  in  the  data  set.  .  .  3 

Figure  3.  Least  squares  regression  vs.  quantile  regression  at  a  probability 
level  a  =  0.75,  before  and  after  changing  one  observation  in  the 

data  set .  4 

Figure  4.  Least  squares  vs.  quantile  regression  at  a  probability  level  a  =  0.60.  5 

Figure  5.  Example  of  multiple  optimal  solutions  for  problem  SqR .  33 

Figure  6.  Example  A:  Computing  times  for  solving  Du  with  three  different 

algorithms  (subgradient,  coordinate  descent,  and  cutting  plane 

methods),  for  increasing  sample  sizes  v .  77 

Figure  7.  Example  A:  Primal  versus  dual  methods  computing  times  for 

increasing  sample  sizes  v,  in  logarithmic  scale .  78 

Figure  8.  Example  B:  Engel  data  set .  83 

Figure  9.  Example  B:  Least  squares  and  quantile  regression  functions,  for 

varying  a .  84 

Figure  10.  Example  B:  Least  squares  and  superquantile  regression  functions, 

for  varying  a .  86 

Figure  11.  Example  B:  Regression  functions  for  linear  and  quadratic  models.  88 

Figure  12.  Example  B:  Least  squares,  quantile,  and  superquantile  regression 

functions  for  the  quadratic  model  /2(x)  =  c0  +  C\X  +  c2x2 .  89 

Figure  13.  Example  C:  Stack  loss  data  scatterplot  matrix .  91 

Figure  14.  Example  C:  Regression  functions  for  linear  and  quadratic  models.  94 

Figure  15.  Example  D:  Regression  lines  for  model  Co  +  crlvArlv .  97 

Figure  16.  Example  E:  U.S.  Navy  helicopter  pilots  data  scatterplot  matrix.  .  99 

Figure  17.  Example  E:  Superquantile  regression  applied  to  the  U.S.  Navy 

helicopter  pilots  data .  101 

Figure  18.  Example  F:  Portuguese  submariners  effort  index  against  their  ages 

and  years  they  have  the  submariners  insignia .  103 

Figure  19.  Example  F:  Portuguese  submariners  effort  index  scatterplot  matrix.  105 

Figure  20.  Example  F:  Submariners  ages  against  the  number  of  years  they 

have  the  submariners  insignia .  106 

Figure  21.  Example  F:  Regression  lines  for  model  fi(x)  =  cq  +  Cdoiphins^doiphins-  108 
Figure  22.  Example  F:  Least  squares,  quantile  and  superquantile  regression 
functions  for  linear  model  f\.  An  asterisk  indicates  that  the  0.75- 
superquantile  regression  function  was  obtained  after  reversing  the 
orientation  of  the  original  problem .  109 


IX 


Figure  23. 

Figure  24. 
Figure  25. 


Example  F :  Different  «-super quantile  regression  functions  for  lin¬ 
ear  model  fi.  An  asterisk  indicates  that  the  0.75-superquantile 
regression  function  was  obtained  after  reversing  the  orientation  of 

the  original  problem . 

Example  F:  Quadratic  regression  models  /3  at  probability  level 

a  =  0.75 . 

Example  F:  Cook’s  distances  for  least  squares  and  superquan¬ 
tile  regression  fits  using  quadratic  model  fiix)  =  cq  +  cagea;a ge  + 
^age2*£age:  ^  ^  0.75 . 


110 

111 

113 


x 


LIST  OF  TABLES 


Table  1.  Example  A:  Computing  times  (sec.)  for  solving  p  for  increas¬ 
ingly  larger  sample  sizes  v .  72 

Table  2.  Example  A:  Solution  vectors  and  computing  times  (sec.)  for  vary¬ 
ing  number  of  observations  v,  integration  rules  for  solving  P^m 

as  well  as  number  of  integration  subintervals  /i .  74 

Table  3.  Example  A:  Computing  times  (sec.)  for  solving  Dv  using  different 

implementations  of  the  dual  methods  for  increasing  sample  sizes  v.  76 
Table  4.  Example  A:  Solution  vectors  and  computing  times  (sec.)  for 
the  superquantile  regression  problem  with  varying  computational 

methods,  and  sample  sizes  v .  79 

Table  5.  Example  A:  Solution  vectors  and  computing  times  (sec.)  for  solv¬ 
ing  Dv  when  implementing  the  subgradient  method  with  varying 

probability  levels  a  and  number  of  observations  v .  79 

Table  6.  Example  A:  Solution  vectors  and  computing  times  (sec.)  for  solv¬ 
ing  Du  when  implementing  the  coordinate  descent  method  with 

varying  probability  levels  a  and  number  of  observations  v .  80 

Table  7.  Example  A:  Solution  vectors  and  computing  times  (sec.)  for  solv¬ 
ing  Dv  when  implementing  the  cutting  plane  method  with  varying 

probability  levels  a  and  number  of  observations  v .  80 

Table  8.  Example  B:  Solution  vectors  (c0,  Ci)  and  coefficients  of  determi¬ 
nation  for  the  linear  model  of  the  form  f\  (x)  =  Cq  +  c.\X,  and 
solution  vectors  (c0,ci,c2)  and  coefficients  of  determination  for 

the  quadratic  model  of  the  form  /2(x)  =  Co  +  C\X  +  c2x2 .  87 

Table  9.  Example  C:  Regression  vectors,  R2a ,  and  R2a  Adj  for  the  linear  model 
fi  which  includes  all  explanatory  variables,  and  for  different  prob¬ 
ability  levels  a .  92 

Table  10.  Example  C:  Coefficients  of  determination  for  different  probability 

levels  a .  92 

Table  11.  Example  C:  Regression  vectors,  A2,  and  R2  Adj  f°r  linear  and 
quadratic  models,  /2  and  fs,  respectively,  for  varying  probability 

levels  a .  93 

Table  12.  Example  D:  Approximate  least  squares  (LS),  quantile,  and  su¬ 
perquantile  regression  vectors  and  A2  Adj  f°r  models  /i,  /2,  and 

fs . 96 

Table  13.  Example  E:  Regression  vectors,  i?2,  and  R?a  Adj  for  model  fi(x)  = 

Co  +  Cyears^years  +  cBmi^bmi  and  f2(x)  =  c0  +  cyearsa;years  at  varying 

probability  levels  a .  98 

Table  14.  Example  F:  Regression  vectors  and  R2  for  linear  models  fi  and 

/2,  at  a  fixed  probability  level  a  =  0.75 .  107 


xi 


Table  15.  Example  F :  Regression  vectors  and  R2a  for  quadratic  model  /3(x)  = 

C0  +  Cagea;age  +  Cage2^age>  witl1  a  =  °-75 .  109 

Table  16.  Example  G:  Approximate  regression  vectors  and  coefficients  of  de¬ 
termination  for  superquantile  regression  with  varying  a  and  least 

squares  (LS)  regression .  114 

Table  17.  Example  G:  Statistics  of  fi(X)  and  /2(AT)  as  compared  to  those 
of  Y.  Columns  3-10  show  mean,  standard  deviation,  superquan¬ 
tiles  at  0.75,  0.9,  0.99,  0.999,  probability  of  failure,  and  buffered 

probability  of  failure,  respectively. .  117 

Table  18.  Example  H:  Approximate  95%  confidence  intervals  when  tracking 
qO'g(Y(-))  near  x  =  (0.5,  0.5)  using  shrinking  sampling  ranges  for 
X.  The  correct  value  g0.9(E((0.5, 0.5)))  =  1.378 .  118 


ACKNOWLEDGMENTS 


I  would  like  to  thank  my  advisor,  Dr.  Johannes  Royset,  for  his  time,  patience, 
and  guidance  throughout  this  arduous  but  rewarding  journey.  I  am  grateful  for  having 
the  chance  to  work  with  yon  and  for  the  knowledge  yon  transmitted  along  this  process. 
Even  when  I  returned  home  to  serve  the  Portuguese  Navy,  yon  were  tireless.  I  am  so 
glad  yon  did  not  give  up  on  me.  Muito  obrigada! 

Also,  I  also  want  to  express  my  gratitude  to  the  other  members  of  my  com¬ 
mittee,  Dr.  Carlos  Borges,  Dr.  Lyn  Whitaker,  Dr.  Javier  Salmeron,  and  Dr.  Ned 
Dimitrov.  I  am  sincerely  grateful  to  you  for  sharing  your  comments,  suggestions,  and 
revisions  during  the  completion  of  this  dissertation.  Dr.  Matt  Carlyle  and  Dr.  Ron 
Flicker,  along  with  the  other  Operations  Research  faculty  of  the  Naval  Postgraduate 
School,  thank  you  for  allowing  me  the  opportunity  to  learn  from  you.  I  am  forever 
indebted  to  Dr.  R.  Tyrrell  Rockafellar  for  his  expertise,  and  Dr.  Stan  Uryasev  for  his 
support  with  Portfolio  Safeguard  implementations. 

My  fellow  Ph.D.  students,  Jay  Foraker,  Jesse  Pietz,  Dick  McGrath,  and  Chris¬ 
tian  Klaus,  or  shall  I  say  Dr.  Jay,  Dr.  Jesse,  Dr.  Dick,  and  Dr.  Christian,  congrats  on 
your  achievements  and  I  am  grateful  for  your  friendship  and  encouragement  during 
some  difficult  moments.  Matt  Hawks,  Gary  Lazzaro,  and  Austin  Wang,  just  keep  up 
the  hard  work.  You  will  do  great! 

This  achievement  is  only  possible  because  the  Portuguese  Navy  believed  in 
me  and  I  will  do  my  very  best  to  show  that  this  level  of  education  is  a  strategy  that 
should  be  constantly  pursued. 

I  would  like  to  thank  my  family  and  friends  for  their  constant  care  and  support. 
But  most  of  all,  I  am  grateful  to  you,  my  husband,  Paulo.  You  gave  me  courage,  and 
you  showed  me  (a  few  times)  that  giving  up  is  not  for  me.  Your  constant  support 
made  me  strong  to  endure  the  ups  and  downs  we  experienced  during  this  memorable 
journey.  And  I  certainly  will  not  forget  when  you  took  care  of  our  baby  when  we 


returned  to  Monterey  for  almost  two  months,  in  July  2014.  I  am  glad  you  were  able 
to  witness  all  those  great  baby  milestones  and  I  am  so  proud  of  you  as  a  Submariner, 
as  a  Husband,  and  now  also  as  a  great  Dad. 

And  finally,  this  dissertation  is  for  you,  my  baby  girl,  Isabel.  I  love  having 
you  in  my  arms,  and  although  you  are  still  too  young  to  understand  what  a  hug  is, 
your  warmth  has  given  me  the  strength  and  audacity  to  keep  moving  forward  and 
complete  this  goal  I  have  set  for  myself. 


EXECUTIVE  SUMMARY 


Analysts  are  often  concerned  with  upper-tail  realizations  of  random  variables  de¬ 
scribing  loss,  cost,  damage  of  a  system  and  attempt  to  approximate  such  loss  random 
variables  in  terms  of  explanatory  random  variables  that  are  more  accessible  in  some 
sense.  We  develop  a  novel  regression  framework  that  naturally  extends  least  squares 
and  quantile  regressions  to  contexts  where  an  analyst  seeks  to  assess  regression  errors 
not  by  squaring  them,  as  in  the  case  of  least  squares  regression,  or  by  looking  at  their 
signs,  as  in  the  case  of  quantile  regression,  but  by  weighing  larger  errors  increasingly 
heavily  in  a  way  consistent  with  a  coherent  and  averse  risk  measure,  the  superquantile 
risk  measure  (also  called  conditional  value-at-risk). 

In  contrast  to  other  generalized  regression  techniques  that  approximate  condi¬ 
tional  superquantiles  by  various  combinations  of  conditional  quantiles,  this  framework 
for  superquantile  regression  is  the  first  attempt  to  use  superquantiles  directly  in  a  re¬ 
gression  model.  The  only  assumption  we  require  is  that  the  involved  random  variables 
have  finite  second  moment.  We  rely  on  the  superquantile-based  risk  quadrangle  and 
use  the  corresponding  relations  between  measures  of  deviation,  risk,  and  error  applied 
to  the  superquantile  as  the  statistic  to  obtain  superquantile  regression  functions  as 
optimal  solutions  of  an  error  minimization  problem.  We  develop  the  fundamental 
theory  for  superquantile  regression  and  build  an  alternative  problem,  the  deviation- 
based  superquantile  regression  problem,  which  determines  the  regression  coefficients 
by  minimizing  a  measure  of  deviation  as  opposed  to  a  measure  of  error,  leading  to 
computational  advantages  in  problem  size  and  simplification  of  the  objective  function. 
We  examine  existence  and  uniqueness  of  the  obtained  regression  functions  as  well  as 
consistency  and  stability  of  the  regression  functions  under  perturbations  due  to  pos¬ 
sible  measurement  errors  and  from  approximating  empirical  distributions  generated 
by  samples  of  the  underlying  data.  We  develop  rate  of  convergence  results  under  mild 
assumptions. 
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In  this  dissertation,  we  construct  a  model  validation  technique  by  extending 
the  concept  of  coefficient  of  determination  used  in  least  squares  regression  to  both 
quantile  and  superquantile  regression.  We  show  that  these  coefficients  of  determina¬ 
tion  are  bounded  between  0  and  1,  with  values  near  1  preferred,  and  we  also  demon¬ 
strate  that  the  superquantile  regression  problem  in  fact  maximizes  the  coefficient  of 
determination  when  it  aims  to  minimize  the  error  of  the  loss  random  variable  by  wisely 
selecting  the  regression  coefficients.  Since  adding  explanatory  random  variables  pos¬ 
sibly  increases  the  coefficient  of  determination,  we  define  an  adjusted  coefficient  of 
determination  for  quantile  and  superquantile  regression.  Another  validation  analy¬ 
sis  tool  that  we  develope  is  the  concept  of  Cook’s  distance  applied  to  quantile  and 
superquantile  regression. 

We  present  two  classes  of  computational  methods  for  solving  superquantile 
regression  problems.  The  first  computational  method  is  denoted  primal  method, 
where  we  minimize  the  superquantile  deviation  measure  using  analytical  integration 
or  numerical  integration  schemes.  The  second  computational  method  is  based  on  the 
dualization  of  risk.  We  build  a  new  superquantile  regression  problem  by  using  the 
expression  of  risk  and  deviation.  We  compare  the  complexity  of  the  methods  and 
demonstrate  which  ones  are  more  efficient  according  to  the  data  size  and  show  that 
dual  methods  are  superior  and  only  marginally  slower  than  methods  for  least  squares 
regression. 

Finally,  we  present  a  series  of  numerical  examples  that  show  some  of  the  ap¬ 
plication  of  superquantile  regression,  such  as  superquantile  tracking  and  surrogate 
estimation,  that  we  encounter  in  the  areas  of  financial  risk  management,  military 
applications,  reliability  engineering,  and  uncertainty  quantification.  We  compare 
computational  methods  by  presenting  their  runtimes  and  see  how  the  coefficient  of 
determination  and  the  adjusted  one  can  be  relevant  in  assessing  the  goodness  of  fit 
of  the  obtained  regression  models. 
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I.  INTRODUCTION 

A.  MOTIVATION  AND  BACKGROUND 

One  of  the  major  concerns  among  analysts  is  how  to  address  random  variables 
describing  possible  “cost,”  loss,  and  “damage,”  but  for  which  there  is  incomplete 
distributional  information  available.  A  possibility  is  to  attempt  to  approximate  such  a 
loss  random  variable  by  a  combination  of  explanatory  random  variables  that  are  more 
accessible  in  some  sense.  This  situation  naturally  leads  to  least  squares  regression 
and  related  models  that  estimate  conditional  expectations.  While  such  models  are 
adequate  in  many  situations,  they  fall  short  in  contexts  where  a  decision  maker  is  risk 
averse,  i.e. ,  is  more  concerned  about  upper-tail  realizations  of  the  loss  random  variable 
than  average  loss,  and  views  errors  asymmetrically  with  underestimating  losses  being 
considered  more  prejudicial  than  overestimating. 

Another  approach  is  based  on  quantile  regression  (see  Koenker,  2005;  Gilchrist, 
2008  and  references  therein),  which  accommodates  risk-averseness  and  an  asymmetric 
view  of  errors  by  estimating  conditional  quantiles  at  a  certain  probability  level  such 
as  those  in  the  tail  of  the  conditional  distribution  of  the  loss  random  variable.  While 
suitable  in  some  contexts,  quantile  regression  only  deals  with  the  signs  of  the  errors 
and  therefore  might  be  overly  “robust”  in  the  sense  that  portions  of  a  data  set  can 
change  without  necessarily  impacting  the  best-fit  regression  function,  as  illustrated 
below. 

In  this  dissertation,  we  focus  on  contexts  where  a  decision  maker  is  concerned 
about  upper-tail  realizations  of  the  loss  random  variable,  and  errors  are  not  only  seen 
asymmetrically  but  their  magnitude  is  also  taken  into  account.  Of  course,  a  parallel 
development  with  an  opposite  orientation,  focused  on  profits  and  gains,  and  concerns 
about  overestimating  instead  of  underestimating  is  also  possible  but  not  considered 
in  this  dissertation. 

Before  we  proceed  with  the  literature  review,  we  analyze  one  simple  example. 
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We  consider  a  loss  random  variable  Y  and  an  available  explanatory  random  variable 
X.  Since  the  distribution  of  the  loss  random  variable  Y  might  not  be  fully  known,  it 
may  be  beneficial  to  approximate  Y  by  this  random  vector  X. 

For  this  example,  we  have  a  table  of  50  pairs  of  observations  available,  {ad,  y1}, 

with  i  =  1, _ » 50,  as  seen  in  the  scatter  plot  in  Figure  1.  We  consider  a  regression 

function  of  the  form  f(x)  =  Co  +  ex,  with  Co,  c  E  JR.  This  numerical  example  is 
artificially  designed  to  show  how  different  regression  techniques  react  to  small  data 
changes. 


Figure  1.  Scatter  plot  of  the  data  for  the  constructed  example. 

Figure  2(a)  gives  the  least  squares  and  0.75-quantile  regression  functions.  We 
observe  that  the  0.75-quantile  regression  function  divides  the  data  set  into  two,  such 
that  25%  of  the  observations  remain  above  the  obtained  regression,  while  the  remain¬ 
ing  75%  lay  below. 
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(a)  Before  any  changes  in  the  data  set. 


x 


(b)  After  shifting  six  observations  upwards. 

Figure  2.  Least  squares  regression  vs.  quantile  regression  at  a  probability  level 
a  =  0.75,  before  and  after  some  changes  in  the  data  set. 
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In  Figure  2(b),  we  see  how  the  least  squares  and  the  quantile  regression  models 
adjust  to  changes  in  the  data  set,  denoted  by  the  red  dots.  Notice  that  the  observa¬ 
tions  are  moved  upwards  without  changing  their  position  relative  to  the  0.75-quantile 
regression  curve.  The  balance  of  25%  of  the  observations  above  and  75%  of  them 
below  the  quantile  regression  curve  has  not  been  compromised.  Therefore,  as  we  can 
observe  in  Figure  2(b),  the  quantile  regression  curve  does  not  shift  after  modifying 
the  six  observations.  Such  robustness  is  sometimes  desirable,  but  at  other  times  there 
is  the  need  for  responsiveness.  In  comparison,  the  least  squares  regression  function 
shifts  upwards  reacting  to  the  data  changes. 


X 


Figure  3.  Least  squares  regression  vs.  quantile  regression  at  a  probability  level 
a  =  0.75,  before  and  after  changing  one  observation  in  the  data  set. 


Changing  only  one  observation,  as  shown  in  Figure  3,  we  note  that  the  ob¬ 
tained  quantile  regression  function  changes  its  slope,  while  the  least  squares  regression 
function  hardly  changes.  If  we  shift  this  observation  even  further,  the  change  in  the 
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Figure  4.  Least  squares  vs.  quantile  regression  at  a  probability  level  a  =  0.60. 

slope  for  the  quantile  regression  function  is  even  more  significant.  Once  again  the 
least  squares  regression  model  hardly  changes.  If  we  change  this  observation  in  red 
even  further  upwards,  we  would  notice  no  more  changes  in  the  quantile  regression 
function  obtained  in  Figure  3,  since  the  balance  of  the  data  above  and  below  the 
quantile  regression  would  no  longer  be  compromised. 

Quantile  regression  is  a  robust  regression  technique,  but  its  sensitivity  to 
changes  in  data  might  sometimes  be  too  small  as  indicated  above.  Other  times  the 
sensitivity  might  be  too  large  as  illustrated  above  where  the  change  of  a  single  data 
point  triggers  a  jump  in  the  regression  curve.  On  the  contrary,  the  least  squares  re¬ 
gression  is  more  stable,  with  smooth  adjustments  in  the  curve  comparable  to  changes 
in  the  data  set. 

As  another  motivation  to  this  novel  regression  technique,  we  consider  a  real- 
world  data  set:  the  Portuguese  Navy  submariners  effort  index,  provided  by  the  Por¬ 
tuguese  Navy  Submarine  Squadron.  In  this  data  set  we  seek  to  estimate  the  random 
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variable  Y  that  represents  the  effort  index  of  the  submariners.  This  index  was  cre¬ 
ated  as  a  decision  tool  to  support  human  resource  management  inside  the  Submarine 
Squadron.  It  allows  planners  to  assess  which  submariners  are  “due”  for  another  mis¬ 
sion. 

In  Figure  4,  we  have  103  observations  of  number  of  years  since  a  submariner 
has  gained  the  insignia  of  the  Portuguese  submarine  service  (^dolphins)  against  the 
submariners  effort  index  (Y).  In  red  and  blue  colors,  we  see  two  regression  functions, 
the  least  squares  and  the  0.60-quantile  regression,  respectively.  The  0.60-quantile 
regression  fit  analyzes  the  sign  of  the  errors  defined  as  the  differences  between  the 
loss  random  variable  Y  and  the  chosen  linear  model.  Instead  of  only  regarding  the 
signs  of  these  errors,  we  want  to  also  account  for  their  magnitudes,  namely  we  want 
to  analyze  the  average  of  the  40%  highest  effort  indices. 

These  two  examples  motivate  the  need  to  move  beyond  least  squares  and  quan¬ 
tile  regression  and  develop  superquantile  regression.  They  illustrate  how  a  regression 
technique  such  as  the  quantile  regression,  which  accommodates  risk-averseness  and  an 
asymmetric  view  of  errors,  may  not  be  suitable  in  some  contexts  where  the  decision 
maker  is  also  concerned  with  the  magnitude  of  those  errors  as  well  as  the  “average 
worst-case”  behavior. 

B.  CONNECTIONS  WITH  THE  LITERATURE 

A  quantile  corresponds  to  “value-at-risk”  (VaR)  in  financial  terminology  and 
relates  to  “failure  probability”  in  engineering  terms.  Quantile  regression  informs 
the  decision  maker  about  these  quantities  conditional  on  values  of  the  explanatory 
random  vector  X.  However,  a  quantile  is  not  a  coherent  measure  of  risk  in  the 
sense  of  Artzner  et  al.  (1999)  (see  also  Delbaen,  2002);  it  fails  to  be  subadditive. 
Consequently,  a  quantile  of  the  sum  of  two  random  variables  may  exceed  the  sum 
of  the  quantiles  of  each  random  variable  at  the  same  probability  level,  which  runs 
counter  to  our  understanding  of  what  “risk”  should  express.  Moreover,  quantiles  cause 
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computational  challenges  when  incorporated  into  decision  optimization  problems  as 
objective  function,  failure  probability  constraint,  or  chance  constraint.  The  use  of 
quantiles  and  the  closely  related  failure  probabilities  is  therefore  problematic  in  risk- 
averse  decision  making;  see  Artzner  et  al.  (1999),  Rockafellar  and  Uryasev  (2000), 
Rockafellar  and  Royset  (2010),  Krokhmal  et  al.  (2011),  and  Rockafellar  and  Uryasev 
(2013)  for  a  detailed  discussion. 

A  superquantile  of  a  random  variable,  also  called  “conditional  value-at-risk” 
(CVaR),  average  value-at-risk,  and  expected  shortfall,  is  an  “average”  of  certain  quan¬ 
tiles  as  described  further  below.  We  prefer  the  application-neutral  name  “superquan¬ 
tile”  when  deriving  methods  applicable  broadly.  This  is  a  coherent  measure  of  risk 
well  suited  for  risk-averse  decision  making  and  optimization;  see  Wang  and  Urya¬ 
sev  (2007)  for  its  application  in  financial  engineering,  Kalinchenko  et  al.  (2011)  for 
military  applications,  and  Rockafellar  and  Royset  (2010)  for  use  in  reliability  engi¬ 
neering.  While  this  risk  measure  has  reached  prominence  in  risk-averse  optimization, 
there  has  been  much  less  work  on  regression  techniques  that  are  consistent  with  it  in 
some  sense. 

The  foundation  of  least  squares  and  quantile  regression  is  the  fact  that  mean 
and  quantiles  minimize  the  expectation  of  certain  convex  random  functions.  A  nat¬ 
ural  extension  to  superquantile  regression  could  then  possibly  involve  determining  a 
random  function  that  when  minimizing  its  expectation,  we  obtain  a  superquantile. 
However,  such  a  random  function  does  not  exist  (as  discussed  in  Gneiting,  2011;  Chun 
et  ah,  2012),  which  has  led  to  studies  of  indirect  approaches  to  superquantile  tracking 
grounded  in  quantile  regression. 

For  a  random  variable  with  a  continuous  cumulative  distribution  function, 
a  superquantile  equals  a  conditional  expectation  of  the  random  variable  given  real¬ 
izations  no  lower  than  the  corresponding  quantile.  Utilizing  this  fact,  studies  have 
developed  kernel-based  estimators  for  the  conditional  probability  density  functions, 
which  are  then  integrated  and  inverted  to  obtain  estimators  of  conditional  quantiles. 
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An  estimator  of  the  conditional  superquantile  is  then  finally  constructed  by  integrat¬ 
ing  the  density  estimator  over  the  interval  above  the  quantile  (Scaillet,  2005;  Cai 
&  Wang,  2008)  or  forming  a  sample  average  (Kato,  2012).  These  studies  also  in¬ 
clude  asymptotic  analysis  of  the  resulting  estimators  under  a  series  of  assumptions, 
including  that  the  data  originates  from  certain  time  series. 

A  superquantile  of  a  random  variable  is  defined  in  terms  of  an  integral  of 
corresponding  quantiles  with  respect  to  the  probability  level.  Since  the  integral  is 
approximated  by  a  weighted  sum  of  quantiles  across  different  probability  levels,  an 
estimator  of  a  conditional  superquantile  emerges  as  the  sum  of  conditional  quantiles 
obtained  by  quantile  regression;  see  Peracchi  and  Tanase  (2008),  and  Leorato  et  al. 
(2012),  which  also  show  asymptotic  results  under  a  set  of  assumptions  including  the 
continuous  differentiability  of  the  cumulative  distribution  function  of  the  conditional 
random  variables.  Similarly,  Chun  et  al.  (2012)  utilizes  the  integral  expression  for  a 
superquantile,  but  observes  that  a  weighted  sum  of  quantiles  is  an  optimal  solution 
of  a  certain  minimization  problem;  see  Rockafellar  and  Uryasev  (2013).  Analogous  to 
the  situation  in  least  squares  and  quantile  regression,  an  optimization  problem  yields 
an  estimator  of  a  conditional  superquantile.  Though,  in  contrast  to  the  case  of  least 
squares  and  quantile  regression,  the  estimator  is  “biased”  due  to  the  error  induced  by 
replacing  an  integral  by  a  finite  sum.  Under  a  linear  model  assumption,  Chun  et  al. 
(2012)  also  constructs  a  conditional  superquantile  estimator  using  an  appropriately 
shifted  least  squares  regression  curve  based  on  quantile  estimates  of  residuals.  In  both 
cases,  asymptotic  results  are  obtained  for  a  homoscedastic  linear  regression  model. 
Under  the  same  model,  Trindade  et  al.  (2007)  studies  “constrained”  regression,  where 
the  error  random  variable  Z$  —  Y  —  /( X)  is  minimized  in  some  sense,  for  example 
in  terms  of  least  square  or  absolute  deviation,  subject  to  a  constraint  that  limits  a 
superquantile  of  Zf.  While  this  approach  does  not  lead  to  superquantile  regression  in 
the  sense  we  derive  in  this  dissertation,  it  highlights  the  need  for  alternative  techniques 
for  regression  that  incorporate  superquantiles  in  some  manner. 


The  need  for  moving  beyond  classical  regression  centered  on  conditional  expec¬ 
tations  is  now  well  recognized  and  has  driven  even  further  research  towards  estimating 
conditional  distribution  function,  i.e.,  P  {Y (. x )  <  y}  for  all  y  E  JR,  using  nonparamet- 
ric  kernel  estimators  (see  for  example  Hall  &  Muller,  2003)  and  transformation  models 
(see  for  example  Hothorn  et  ah,  2014).  We  denote  by  Y (x)  the  conditional  random 
variable  Y  given  that  X  =  x  E  JRn .  Of  course,  conditional  distribution  functions 
provide  the  “full”  information  about  Y(x )  including  its  quantiles  and  super  quantiles, 
and  therefore  also  provide  a  means  to  inform  a  risk-averse  decision  maker.  In  this 
dissertation,  however,  we  directly  focus  on  superquantiles,  which  we  believe  deserve 
special  attention  due  to  their  prominence  in  risk  analysis. 

A  framework  for  generalized  regression  is  laid  out  in  Rockafellar  et  al.  (2008), 
and  Rockafellar  and  Uryasev  (2013),  and  regression  functions  are  obtained  as  optimal 
solutions  of  optimization  problems  of  the  form  min fS(Zf),  where  £  is  a  measure  of 
error  and  /  is  restricted  to  a  certain  class  of  functions  such  as  the  affine  functions. 
Least  squares  regression  is  obtained  by  £(Zf )  =  E[Z'j\,  quantile  regression  with  the 
Koenker-Bassett  measure  of  error,  but  many  other  possibilities  exist.  While  it  is  not 
possible  to  determine  a  measure  of  error  that  is  of  the  expectation  type  and  yields 
a  superquantile,  in  Section  II. A  we  show  that  when  allowing  for  a  broader  class  of 
functionals,  a  measure  of  error  that  generates  a  superquantile  is  indeed  available. 
Such  a  measure  of  error  is  also  hinted  at  in  Rockafellar  and  Royset  (2014b),  but  this 
dissertation  as  well  as  the  supporting  paper  by  Rockafellar  et  ah  (2014)  gives  the 
first  comprehensive  treatment.  In  contrast  to  previous  studies  towards  superquan¬ 
tile  tracking,  which  utilize  indirect  approaches  and  quantile  regression,  we  here  offer 
a  natural  extension  of  least  squares  and  quantile  regression.  We  replace  the  mean- 
squares  and  Koenker-Bassett  (cf.  eq.  (II. 9))  error  measures  by  a  new  error  measure, 
and  then  simply  minimize  that  error  of  Zf  to  obtain  a  regression  function.  Un¬ 
der  few  assumptions,  we  establish  the  existence  of  a  regression  function,  discuss  its 
uniqueness,  and  examine  stability  under  perturbations  of  the  distribution  of  (X,  Y) 
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for  example  caused  by  sampling.  We  omit  a  discussion  of  simple  linear  models  with 
independent  and  identically  distributed  (iid)  noise  as  we  believe  that  there  is  little 
need  for  quantile  and  superquantile  regression  in  such  contexts  as  least  squares  re¬ 
gression  with  an  appropriate  shift  suffices.  In  fact,  we  do  not  separate  models  into 
(additive)  deterministic  and  stochastic  terms.  In  many  applications,  especially  in  the 
area  of  uncertainty  quantification,  heteroscedasticity  and  dependence  are  prevalent 
making  linear  iid  and  additive  models  of  little  value. 

C.  SCOPE  OF  DISSERTATION 

In  this  dissertation,  we  focus  on  two  distinct  situations  where  the  importance 
of  a  novel  regression  methodology  becomes  apparent.  We  consider  a  loss  random 
variable  Y  for  which  there  is  incomplete  distributional  information  available,  and  an 
explanatory  random  variable  X  that  is  more  accessible  in  some  sense. 

We  denote  the  first  situation  and  the  one  we  address  more  often  during  this 
dissertation  by  surrogate  estimation.  It  usually  occurs  when  the  explanatory  random 
variable  is  beyond  our  direct  control,  but  the  dependence  between  the  loss  and  the 
explanatory  random  variable  makes  us  hopeful  that,  for  a  carefully  selected  regression 
function,  such  explanatory  random  variable  may  serve  as  a  surrogate  for  the  loss 
random  variable.  When  the  distribution  of  the  explanatory  random  variable  is  known, 
at  least  approximately,  and  the  regression  function  has  been  determined,  then  the 
distribution  of  f(X)  is  usually  easily  accessible.  That  distribution  may  then  serve 
as  input  to  further  analysis,  simulation,  and  optimization  in  place  of  the  unknown 
distribution  of  the  loss  random  variable  Y .  Such  surrogate  estimation  may  arise 
in  numerous  contexts.  “Factor  models”  in  financial  investment  applications  are  a 
result  of  surrogate  estimation  (see  for  example  Connor,  1995;  Knight  et  al.,  2005), 
where  the  random  variable  we  aim  to  estimate  may  be  the  loss  associated  with  a 
particular  asset  and  the  explanatory  variable  a  vector  describing  a  small  number  of 
macroeconomic  “factors.”  “Uncertainty  quantification”  (see  for  example  Lee  &  Chen, 
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2009;  Eldred  et  al.,  2011)  considers  the  output  of  a  system  described  by  a  random 
variable,  for  example  measuring  damage,  and  estimates  its  moments  and  distribution 
from  observed  realizations  as  well  as  knowledge  about  the  distribution  of  the  input 
to  the  system  characterized  by  an  explanatory  random  vector.  A  main  approach  here 
centers  on  surrogate  estimation  with  the  obtained  regression  function  serving  as  an 
estimate  of  the  loss  random  variable. 

Another  situation  arises  when  the  primary  concern  is  with  the  conditional 
loss  given  that  the  explanatory  random  variable  X  takes  on  specific  values.  We  aim 
to  select  these  values  judiciously  in  an  effort  to  minimize  the  conditional  loss.  We 
denote  this  second  situation  by  superquantile  tracking.  Of  course,  “minimizing”  Y(x) 
is  not  well-defined  and  a  standard  approach  is  to  minimize  a  risk  measure  of  Y(x); 
see  for  example  Krokhmal  et  al.  (2011),  and  Rockafcllar  and  Uryasev  (2013).  An 
attractive  choice  is  to  use  a  superquantile  measure  of  risk,  which  has  nice  properties 
and  is  also  computationally  approachable.  While  in  some  contexts  a  superquantile  of 
the  conditional  loss  can  be  evaluated  easily  for  any  specific  value  of  the  explanatory 
random  vector,  there  are  numerous  situations,  especially  beyond  the  financial  domain, 
where  only  a  table  of  realizations  of  conditional  loss  is  available  for  various  values  of 
the  explanatory  random  vector.  In  the  latter  situation,  there  is  a  need  for  building 
an  approximating  model,  based  on  the  data,  for  the  relevant  superquantile  of  the 
conditional  loss  as  a  function  of  the  explanatory  variables. 

D.  CONTRIBUTIONS 

The  main  contribution  of  this  dissertation  is  the  development  of  a  novel  re¬ 
gression  framework  that  naturally  extends  least  squares  and  quantile  regressions  to 
contexts  where  one  seeks  to  assess  regression  errors  not  by  squaring  them,  as  in  the 
case  of  least  squares  regression,  or  by  looking  at  their  signs,  as  in  the  case  of  quantile 
regression,  but  by  weighing  larger  levels  of  underestimation  increasingly  heavily  in  a 
manner  consistent  with  superquantiles. 
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This  generalized  regression  technique  is  the  first  attempt  to  use  superquantiles 
directly  in  the  regression  model  as  opposed  to  an  approximation  of  conditional  quan¬ 
tiles.  We  develop  the  fundamental  theory  for  the  new  regression  technique  and  deal 
with  issues  encountered  in  any  generalized  regression  framework,  such  as  existence 
and  possible  uniqueness  of  the  obtained  regression  functions.  We  discuss  consistency 
and  stability  of  these  regression  functions  under  perturbations  due  to  possible  mea¬ 
surement  errors  and  approximating  empirical  distributions  generated  by  samples  of 
the  underlying  distribution.  And  we  also  examine  rate  of  convergence  results  under 
mild  assumptions.  We  present  means  of  assessing  the  goodness  of  fit  of  the  obtained 
quantile  and  superquantile  regression  models,  by  applying  the  concepts  of  coefficient 
of  determination,  adjusted  coefficient  of  determination,  and  Cook’s  distance  to  quan¬ 
tile  and  superquantile  regression  techniques. 

We  develop  two  distinct  classes  of  computational  methods,  one  solving  the 
superquantile  regression  problem  by  means  of  analytical  and  numerical  integration 
techniques,  another  by  relying  on  the  dualization  of  risk  as  a  step  to  build  a  new 
regression  problem  that  we  apply  to  discrete  cases.  We  discuss  complexity  results  of 
both  classes  of  computational  methods,  and  compare  them  to  the  complexity  results 
for  least  squares  and  quantile  regressions. 

We  present  a  series  of  numerical  examples  from  the  areas  of  financial  invest¬ 
ment,  military  applications,  reliability  engineering,  and  uncertainty  quantification. 

E.  DISCLAIMER 

The  information  presented  and  views  expressed  in  this  dissertation  do  not 
reflect  the  official  policy  or  position  of  the  U.S.  Navy,  the  U.S.  Department  of  De¬ 
fense,  the  U.S.  Government,  the  Portuguese  Navy,  and  the  Portuguese  Ministry  of 
National  Defense  or  the  Portuguese  Government.  The  data  sets  we  use  in  our  two 
military  applications  numerical  examples  are  obtained  from  unclassified  sources,  and 
are  employed  in  this  dissertation  in  order  to  illustrate  some  interesting  and  meaningful 
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conclusions  from  our  theoretical  results. 

The  first  military  application  example  considers  the  results  of  an  online  survey 
of  winged  Naval  Helicopter  Pilots  of  the  U.S.  Navy;  see  Phillips  (2011)  for  details. 
As  stated  in  Phillips  (2011),  this  study  is  approved  by  the  NPS  Institutional  Review 
Board  (IRB)  and  has  an  IRB  protocol  number:  NPS. 2011. 0053-IR-EP7-A.  The  second 
military  application  example  considers  a  data  set  provided  by  the  Portuguese  Navy 
Submarine  Squadron. 

F.  ORGANIZATION 

Chapter  II  addresses  the  foundations  of  the  superquantile  regression,  as  an 
extension  of  least  squares  and  quantile  regressions.  The  chapter  discusses  the  su¬ 
perquantile  regression  problem,  the  issues  encountered  in  such  generalized  regression 
frameworks,  and  provides  an  approach  for  assessing  the  goodness  of  fit  of  the  obtained 
quantile  and  superquantile  regression  models. 

Chapter  III  develops  two  classes  of  computational  methods  to  solve  superquan¬ 
tile  regression  problems.  The  Erst  denoted  by  primal  method  solves  superquantile 
regression  problems  using  analytical  and  numerical  integration  schemes.  The  second 
which  we  call  the  dual  method  is  based  on  the  dualization  of  risk  and  utilizes  such 
advantages  to  build  a  new  superquantile  regression  problem  with  promising  compu¬ 
tational  performance,  especially  for  large  sample  sizes.  It  also  discusses  complexity 
results  for  the  presented  algorithms. 

Chapter  IV  provides  several  numerical  results  that  illustrate  not  only  the  pri¬ 
mal  and  dual  methods,  but  also  some  of  the  main  applications  of  the  superquantile 
regression,  such  as  superquantile  tracking  and  surrogate  estimation. 

Chapter  V  summarizes  the  theoretical  and  numerical  results,  presents  our 
conclusions  and  suggests  future  research  opportunities. 
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II.  FOUNDATIONS  OF  SUPERQUANTILE 

REGRESSION 

In  this  chapter,  we  develop  a  regression  technique  that  extends  least  squares 
and  quantile  regressions,  centered  on  expectations  and  quantiles,  respectively,  to  one 
that  focuses  on  superquantiles.  This  material  to  a  large  extent  is  based  on  Rockafellar 
et  al.  (2014). 

Section  II. A  describes  measures  of  error,  risk,  deviation,  and  regret,  first  in  the 
context  of  quantile  regression  and  then  for  the  extension  to  superquantile  regression. 
Section  II. B  defines  superquantile  regression  as  the  minimization  of  a  measure  of  er¬ 
ror,  provides  an  alternative  approach  for  solving  superquantile  regression  problems 
based  on  the  measure  of  deviation,  discusses  existence  and  uniqueness  of  the  regres¬ 
sion  function,  and  provides  asymptotic  results.  Section  II. C  proposes  an  approach 
for  assessing  the  goodness  of  fit  of  the  regression  function  obtained  by  quantile  and 
superquantile  regressions,  using  extensions  of  the  definitions  of  coefficient  of  determi¬ 
nation  and  Cook’s  distance. 

A.  QUANTILES,  SUPERQUANTILES,  AND  ERRORS 

While  our  development  centers  on  super  quantiles,  it  is  beneficial  to  maintain 
a  parallel  description  of  quantiles.  As  we  will  see  in  Subsection  II. A. 4,  quantile 
regression  achieved  by  minimizing  a  Koenker- Bassett  error  of  the  random  variable  Zf, 
as  seen  in  Subsection  II. A. 3  in  more  detail,  provides  a  road  map  for  the  construction 
of  superquantile  regression,  which  is  simply  achieved  by  minimizing  another  measure 
of  error.  We  start,  however,  with  definitions  and  assumptions,  and  then  provide  an 
overview  of  the  fundamental  risk  quadrangle,  its  application  to  the  superquantile  as 
the  statistic,  and  finally  we  present  the  corresponding  measures  of  error,  deviation, 
and  regret  of  quantiles  and  superquantiles. 
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1.  Definitions  and  Assumptions 

We  consider  a  loss  random  variable  Y  as  a  function  on  a  probability  space 
(fkJ7,  P),  and  in  our  context,  we  assume  that  Y  has  a  finite  second  moment,  as 
follows 

Y  G  £2  :=  £2(fi,  T,  P)  :={Y  \  Y  is  ^-measurable,  E[Y2]  <  oo}.  (II.l) 

Here  fl  is  a  sample  space  with  uj  e  O  being  a  possible  outcome;  T  is  an  event 
space;  and  P  is  a  probability  measure  that  assigns  probabilities  to  these  events, 
P  :  P  [0,1], 

We  now  give  some  useful  definitions.  We  consider  the  following  distinct  func¬ 
tionals  on  jC2,  in  the  sense  of  Rockafellar  and  Uryasev  (2013),  that  assign  numerical 
values  to  random  variables,  e.g.,  a  loss  random  variable  Y.  A  measure  of  error  S(Y) 
quantifies  the  “nonzeroness”  in  Y.  The  £2-norm  of  Y  is  a  possible  measure  of  error. 
A  measure  of  risk  TZ{Y)  serves  as  surrogate  for  the  overall  loss  in  Y .  For  example, 
one  could  think  of  7 Z(Y)  =  sup{F}  (the  essential  supremum)  as  such  a  surrogate, 
or  less  conservatively  7 Z(Y)  =  E\Y].  A  measure  of  deviation  T>(Y)  quantifies  the 
“nonconstancy”  as  uncertainty  in  Y ,  and  can  be  seen  as  a  generalization  of  the  stan¬ 
dard  deviation  of  Y.  A  measure  of  regret  V(Y)  quantifies  the  displeasure  of  obtaining 
mix  realizations  of  Y,  which  might  be  better  when  Y  <  0  (representing  “gains”)  or 
worse  when  Y  >  0  (representing  “losses”).  And  a  statistic  5(H)  is  associated  with  Y 
through  £  and  V,  as  described  below. 

According  to  Rockafellar  et  al.  (2008),  we  say  that  a  measure  of  risk  is  coherent 
if  the  following  axioms  hold: 

(i)  7 Z(c)  =  c  for  a  constant  c. 

(ii)  1Z{XY)  =  X1Z{Y)  when  A  >  0  (positive  homogeneity). 

(hi)  7 Z(Y  +  Y')  <  U(Y)  +  7 Z{Y')  (subadditivity). 

(iv)  Tl(y)  <  7 Z(Yr)  when  Y  <  Y1  (monotonicity). 
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This  definition  is  equivalent  to  the  one  described  in  Artzner  et  al.  (1999),  where 
axiom  (i)  is  replaced  by  translation  invariance.  When  we  refer  to  a  coherent  measure 
of  risk,  we  refer  to  the  axioms  listed  above.  The  concept  of  a  coherent  measure  of  risk 
is  important  in  our  context  because  it  follows  the  natural  way  we  think  about  risk, 
where  monotonicity  is  a  requirement.  Moreover,  if  7 Z(Y)  >  i7[Y'],  for  a  nonconstant 
random  variable  Y,  then  77(-)  is  averse. 

According  to  Rockafellar  and  Uryasev  (2013),  a  regular  measure  of  risk  sat¬ 
isfies  the  axiom  (i)  stated  previously,  as  well  as  convexity,  aversity,  and  closedness, 
{Y'\1Z(Y)  <  c}  for  all  constants  c  £  M.  Obviously,  the  expectation  is  not  averse, 
therefore  not  regular. 

Examples  of  measures  of  risk  are  quantiles  and  superquantiles  of  a  loss  random 
variable  Y  at  distinct  probability  levels  a,  as  we  define  below.  For  a  probability 
level  a  G  (0, 1),  the  a-quantile  of  a  random  variable  Y  with  cumulative  distribution 
function  Fy  is  defined  as 

qa{Y)  :=  min  {y  G  1R  \  FY(y )  >  «}  • 

Its  quantiles  are  as  fundamental  to  Y  as  the  distribution  function,  but  are  problem¬ 
atic  to  incorporate  in  risk  analysis  and  optimization  due  to  their  lack  of  coherency 
as  well  as  increased  computational  challenges;  see  Rockafellar  and  Royset  (2014b). 
Superquantiles  have  more  favorable  properties.  For  a  G  [0, 1),  the  a-superquantile  of 
a  random  variable  Y  is  defined  as 

ga(y):=— !—  [ 1  qy{Y)d /?.  (II.2) 

1  ®  J a 

Since  a  superquantile  is  a  coherent  measure  of  risk  (see  Rockafellar  &  Uryasev,  2000; 
Rockafellar  &  Uryasev,  2002)  and  by  virtue  of  being  an  “average”  of  quantiles,  it  is 
also  more  stable  than  a  quantile  in  some  sense,  and  is  well  suited  for  applications. 
For  a  =  1,  we  define  qa(Y )  :=  sup{U}.  Since 

C  Fy\f3)d(3  = 
o 


E[Y],  (II  .3) 


%(Y)  =  /  qy(Y)d/3 
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we  therefore  focus  on  a  G  (0, 1)  throughout  the  dissertation  to  avoid  distractions  by 
these  special  cases. 

Equivalent  to  equation  (II. 2),  we  have  an  even  more  stable  and  conservative 
measure  of  risk,  the  a -second- order  superquantile,  and  it  is  defined  as 

UY):=^—  C  Qp(Y)df3,  (II -4) 

1  ^  J a 

for  a  random  variable  Y  G  C2  and  a  G  (0, 1). 

In  reliability  terminology,  quantiles  and  superquantiles  correspond  to  failure 
and  buffered  failure  probabilities.  The  failure  probability  of  a  loss  random  variable  Y 

is 

p(Y)  P  {Y  >  0}  =  1  —  Fy(0), 

which  corresponds  to 

p(Y)  =  1  —  a  with  a  such  that  qa(Y )  =  0 

if  there  is  no  probability  atom  at  zero.  Analogously  to  the  latter  expression,  the 
buffered  failure  probability  (see  Rockafellar  &  Royset,  2010)  of  a  loss  random  variable 
Y  is  defined  as 

p(Y)  :=  1  —  a  with  a  such  that  qa(Y )  =  0.  (II. 5) 

Requiring  that  p(Y)  <  1  —  a  is  therefore  equivalent  to  the  constraint  qa(Y)  <  0. 
Consequently,  in  applications  with  a  buffered  failure  probability  constraint  on  a  (con¬ 
ditional)  loss  random  variable  Y(x )  as  well  as  when  the  goal  is  to  minimize  a  su¬ 
perquantile  of  Y (x)  directly,  there  is  a  need  to  estimate  qa(Y(x ))  as  a  function  of 
x  G  Mn.  Quantiles  and  superquantiles  are  connected  through  a  trade-off  formula 
that  leads  to  quantile  regression,  as  discussed  in  Subsection  II. A. 3. 

2.  Overview  of  the  Fundamental  Risk  Quadrangle 

The  “Fundamental  Risk  Quadrangle”  is  a  concept  introduced  by  Rockafellar 
and  Uryasev  (2013),  which  establishes  the  connections  between  distinct  measures, 
described  in  Subsection  II. A.  1,  of  a  random  variable  whose  orientation  is  such  that 
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upper-tail  realizations  are  unfortunate  and  low  realizations  are  favorable,  as  described 
in  Chapter  I.  The  interrelationships  of  such  numerical  quantities  allow  distinct  com¬ 
parisons  and  applications  in  various  analyses,  such  as  risk  management. 

Diagram  3  in  Rockafellar  and  Uryasev  (2013)  defines  the  general  relationships 
between  five  properties  of  a  random  variable  Y,  measures  of  error,  risk,  deviation, 
and  regret,  and  the  corresponding  statistic,  as  we  list  below.  We  use  these  general 
relationships  in  the  next  two  subsections. 

Error  measure  =  £(Y)  =  V(Y)  —  E[Y ] 

Risk  measure  =  7 Z(Y)  =  minco{c0  +  V(Y  —  c0)} 

Deviation  measure  =  V(Y)  =  Tl(Y)  —  E[Y ] 

Regret  measure  =  V{Y)  =  £{Y)  +  E[Y] 

Statistic  =  5(1'')  =  argminco{c0  +  V(Y  —  c0)}  =  argminco  {£  (Y  —  c0)} 

We  now  look  at  the  families  of  risk  quadrangles  where  the  expectation  and  the 
quantile  are  the  statistic.  The  following  two  risk  quadrangles  are  described  in  detail 
in  Rockafellar  and  Uryasev  (2013).  We  list  both  quadrangles  for  illustration  and  to 
exemplify  how  one  obtains  least  squares  and  quantile  regressions  by  minimizing  a 
certain  measure  of  error. 


Variance  Version  of  Mean-based  Quadrangle: 

(Example  1’  in  Rockafellar  &  Uryasev,  2013) 

Error  measure  =  £(Y)  =  \E\\'2] 

Risk  measure  =  1Z(Y)  =  E[Y]  +  Xa2(Y ') 
Deviation  measure  =  T>(Y)  =  Aa2(T') 

Regret  measure  =  V(Y)  =  E[Y]  +  A E\Y2} 
Statistic  =  5(1')  =  E[Y]  =  mean 
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Quantile-based  Quadrangle:  (at  any  probability  level  a  G  (0,1)) 
(Example  2  in  Rockafellar  &  Uryasev,  2013) 

Error  measure  =  £a(Y)  =  E  max  {0,  Y}  +  max  {0,  —  E}] 
Risk  measure  =  7 Za(Y)  =  qa(Y)  =  ce-superquantile 
Deviation  measure  =  Va{Y)  =  qa(Y)  —  E[Y ] 

Regret  measure  =  Va(Y)  =  j^E  [max{0,  Y'}] 

Statistic  =  S(Y)  =  qa(Y)  =  u-quantile 


With  the  idea  in  mind  that  one  minimizes  a  measure  of  error  to  obtain  its  correspond¬ 
ing  statistic  in  the  sense  of  the  “Fundamental  Risk  Quadrangle,”  we  realize  that  this 
approach  allows  us  to  naturally  extend  the  existing  foundations  of  least  squares  and 
quantile  regressions  to  create  new  foundations  for  superquantile  regression. 

3.  Quantile  Regret  and  Error  Measures 

Both  a-quantiles  and  a-super quantiles ,  with  a  G  (0,1),  of  a  loss  random 
variable  Y  are  expressed  in  terms  of  an  optimization  problem  involving  the  measure 
of  regret 

Va(Y)  :=  —— — E[max{0,  F}], 

1  —  a 

as  seen  in  Rockafellar  and  Uryasev  (2013).  Quantiles  and  superquantiles  then  follow 
as 


qa{Y)  G  argmin  {c0  +  Va(Y  -  c0)}  (II. 6) 

coG-R 

qa(Y)  =  min  {c0  +  Va{Y  -  c0)}  ,  (II. 7) 

co&R 

where  in  fact  qa(Y)  is  the  lowest  optimal  solution  if  multiple  solutions  exist. 

The  expression  for  qa(Y)  is  the  essential  building  block  for  quantile  regression, 
but  since  we  ultimately  wish  to  go  beyond  the  class  of  constant  functions  as  candidates 
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for  a  regression  function  we  need  to  pass  to  a  measure  of  error  £a  constructed  from 
VQ  by  setting 

£a(Y)  :=Va(Y)-E[Y] 

for  any  loss  random  variable  Y  (with  .E[|Y|]  <  oo).  Direct  application  of  the  definition 
of  the  measure  of  error  and  recognition  that  a  constant  term  in  an  objective  function 
is  immaterial  with  respect  to  the  optimal  solution  gives 


with 


Qa(Y)  e  argmin £a(Y  -  c0),  (II. 8) 

cq  £  R 


£a(Y  ~  C0) 


— Mmax-fO,  Y  —  c0j]  —  E[Y  —  c0] 
a 

a 

- - max{0,  Y  —  Co)  +  max{0,  — 

1  —  a 


Y  +  Cq} 


(II.9) 


being  a  (scaled)  Koenker- Bassett  error  (Koenker,  2005).  Quantile  regression  centers 
on  computing  this  argmin  with  “minimizing  the  error  of  Y  —  Cq  over  Co  €  IB  '  replaced 
by  “minimizing  the  error  of  Y  —  f(X)  over  a  class  of  functions  /  :  IRn  — >  1R ,”  often 
taken  to  be  the  affine  functions.  We  view  qa(Y)  as  the  “closest”  scalar  to  the  random 
variable  Y  under  a  Koenker-Bassett  error. 

If  our  goal  simply  were  to  estimate  qa(Y)  of  a  loss  random  variable  Y  for  a 
given  probability  level  a  G  (0,1),  the  above  expressions  would  have  sufficed,  pos¬ 
sibly  passing  to  an  empirical  distribution  given  by  a  sample  if  Fy  is  unknown.  In 
the  present  context,  however,  connections  with  the  underlying  explanatory  random 
vector  X  and  the  focus  on  the  “approximation”  of  Y  warrants  a  parallel  develop¬ 
ment  to  that  of  quantile  regression  but  now  centered  on  a  superquantile.  In  view  of 
the  above  review  of  quantile  regression,  it  is  clear  that  superquantile  regression  will 
involve  the  minimization  of  some  measure  of  error  that  returns  the  superquantile  as 
argmin.  Classical  least  squares  regression  can  be  viewed  similarly  as  returning  a  (con¬ 
ditional)  expectation  as  argmin  when  minimizing  the  mean-square  measure  of  error, 
i.e. ,  E[Y]  =  argminCoeR E[(Y  —  cq)2].  The  next  subsection  develops  such  a  measure 
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of  error  by  first  constructing  a  corresponding  measure  of  regret,  for  the  superquantile 
as  the  statistic. 

4.  Superquantile  Regret  and  Error  Measures 

We  start  this  subsection  by  establishing  the  finiteness  of  a  superquantile  under 
the  assumption  that  the  loss  random  variable  Y  has  a  finite  second  moment. 

We  know  from  Rockafellar  and  Uryasev  (2013)  that  qa  is  a  convex,  positively 
homogenous,  monotonic,  and  averse  functional  on  C2  for  a  G  (0,1).  From  Theorem 
3  in  Rockafellar  and  Royset  (2014b),  see  also  Rockafellar  et  al.  (2014),  we  know  that 
qa  is  bounded  as  stated  next.  We  adopt  the  notation  cr2(Y )  =  E[{Y  —  E[Y'])2]. 

Proposition  II.  1.  For  Y  G  C2  and  a  G  (0, 1)  one  has  that 

UY)  <  £[11  +  -Y=o(Y).  (11.10) 

V  l  —  o 

Proof. 

Suppose  that  the  quantile  qa(Y),  viewed  as  a  function  of  the  probability  level,  is 
continuous  at  a.  Let  Ia  be  the  indicator  function  of  the  interval  [qa(Y),  oo).  We  then 
have  by  the  Schwarz  inequality  that 

(1  -  a)q«(Y  -  £[Y])  =  E[(Y  -  E{Y})Ia\ 

<  ^ E\(Y  -  E\Y]Y)^E\if] 

<  cr(F)\/l  —  a. 

Then,  since  qa(Y  —  E^P])  =  qa(Y)  —  E[P],  the  result  follows  from  dividing  by  1  —  a. 
Thus,  (II.  10)  is  valid  under  the  continuity  assumption  about  the  quantile,  which  is 
true  for  all  but  at  most  countable  many  a.  By  continuity  on  both  sides  of  (II.  10) 
with  respect  to  a,  it  must  then  hold  for  all  a  G  (0, 1). 

□ 

The  measure  of  regret  at  probability  level  a  G  (0, 1)  that  serves  in  the  context 
of  superquantile  regression  is  defined  for  any  loss  random  variable  Y  as 

%(Y)  :=  Vo(P),  (IL11) 

1  —  a 
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where 

Vo 00:=/  max{0,  qp(Y)}d/3.  (11.12) 

Jo 

These  expressions  appear  in  Rockafellar  and  Royset  (2014b),  where  their  discovery, 
which  is  related  to  the  Hardy-Littlewood  transform,  is  described.  Here,  we  provide 
the  alternative,  direct  proof  of  Rockafellar  et  al.  (2014),  on  how  these  expressions 
lead  to  the  superquantile  as  optimal  solution  of  (II. 7).  We  start,  however,  with  two 
preliminary  results  and  the  definition  of  a  corresponding  measure  of  error. 

Lemma  II. 1.  For  Y  G  C2, 

Vo(Y')  <  a(Y)  +  max{0,  E[Y]  +  a{Y)}.  (11.13) 


Proof. 

From  (II.  10)  and  (11.12)  we  have 

Vo (Y)<  f  max{0,  OY(P)}d0  for  0y(/3)  -  E[Y]  +  —L=a(Y).  (11.14) 

Jo  V  1  —  P 

We  consider  three  cases.  In  Case  1,  we  suppose  that  9Y(/3)  >  0  for  all  /3  G  [0,1]. 

Then  the  right  hand  side  of  (11.14)  is  given  by 

[  9Y(/3)dj3  =  E[Y]  +  a(Y)  [  (1  -  (3)~1/2df3  with  [  (1  -  f3)~1/2d/3  =  2.  (11.15) 
Jo  Jo  Jo 

Therefore,  V0(Y)  <  E[Y]  +  2 cr(Y)  in  Case  1.  In  Case  2a,  we  suppose  that  6y(/3)  <  0 

for  all  /3  G  (0, 1).  Then  obviously  V0(Y)  <  0.  Finally,  in  Case  2b,  let  9y(/3)  <  0 

for  some  f3  G  (0,1),  but  not  all.  Then  necessarily  a{Y)  >  0  and  E[Y]  <  — cr(Y), 

and  9Y(/3)  strictly  increases  with  respect  to  (3.  Let  a  be  the  unique  f3  G  (0, 1)  with 
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Oy(oi )  =  0,  namely  when  y/1  —  a  =  a(Y)/(— E\Y]).  Then  we  have  that 


max{0 ,9Y(/3)}d/3  =  /  0Y(/3)d/3 


=  (l-a)E[Y]  +  a(Y)  /  (1  -  /3)~1/2df3 

J  a. 

=  (1  -  a)E[Y]  +  2a(Y)Vl^6' 


E[Y]- 
°(Y? 
~ E{Y } 
<  <Y). 


E\Y) 


Thus,  in  Case  2b  we  get  V0(T)  <  cr(Y').  The  conclusion  then  follows  by  putting 
together  the  cases. 

□ 

We  observe  that  for  a  G  (0, 1),  VQ  is  also  a  convex,  positively  homogeneous, 
monotonic,  and  averse  functional  on  £2,  which  follows  from  the  properties  of  the 
superquantile  (Rockafellar  &  Uryasev,  2013),  and  by  the  above  result  it  is  also  finite, 
and  consequently  continuous.  A  corresponding  measure  of  error  is  defined  for  Y  e  C? 
by 

Sa(Y)  :=%(¥) -E[Y]  (11.16) 

and  referred  to  as  a  superquantile  error.  Obviously,  £a  is  also  convex  and  positively 
homogeneous.  It  also  satisfies  the  following  properties. 


Proposition  II. 2.  For  any  a  e  (0, 1)  and  Y  e  C'2 ,  a  superquantile  error  satisfies 

(a)  £a(Y )  =  0  when  Y  =  0, 

(b)  £a(Y)  >  0  when  Y  0,  and 

(c)  £a(Y)  >  min{l,  a/ (1  -  a)}\E[Y]  |. 

Proof. 

Since  qg(0)  =  0  for  all  (3  G  [0, 1],  (a)  follows  trivially. 
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Since  VQ  is  averse,  we  have  that  for  Y  E  C2  £a(Y )  =  Va(Y)  —  E[Y]  >  E[Y ]  — 
E[Y]  =  0  when  Y  is  not  a  constant.  To  complete  part  (b),  we  therefore  only  need  to 
consider  nonzero  constants.  If  Y  is  a  positive  constant  K,  then 
1 


1  —  a 


max{0 ,qp(Y)}d/3  —  E[Y]  >  /  max{0,  q/3(Y)}df3  —  E[Y] 

Jo 

>  K-E[Y ] 

>  0. 


If  Y  is  a  negative  constant  K ,  then 

-  [  max{0 ,qa(Y)}df3  —  E{Y]  =  -  [  max{0,  K}dj3  —  E\Y] 

1-a  J0  1  -  a  J0 

=  0  -E[Y] 

>  0, 

which  completes  part  (b). 

Since  qp(Y)  >  E\Y]  for  all  /3  G  [0, 1],  we  have  whenever  E[Y]  >  0  the  bound 


1  —  a 


Jo 


max{0,  qp(Y)}d!3  —  E[Y]  > 


> 


1  —  a 

a 

1  —  a 


max{0 ,  E[Y]}d/3  -  E[Y] 


E[Y). 


And  when  E[Y]  <  0, 


— ^ —  [  max{0 ,qp(Y)}d/3  -*,E[Y]  >  — ^ —  [  max{0,  E[Y]}df3  —  E[Y] 
l- a  J0  l- a  J0 

>  —E[Y]. 


Part  (c)  then  follows  by  combining  the  two  results. 

□ 

By  Proposition  II. 2  and  the  above  discussion,  £a  is  a  regular  measure  of  error. 
We  now  show  that  a  superquantile  is  a  unique  optimal  solution  of  optimization  prob¬ 
lems  involving  VQ  and  £a.  As  mentioned,  the  connection  between  a  superquantile  and 
VQ  is  also  reached  in  Theorem  7  of  Rockafellar  and  Royset  (2014b)  through  different 
means.  Here  we  derive  the  direct  proof  and  the  connection  with  a  superquantile  error 
(see  Rockafellar  et  ah,  2014). 
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Theorem  II. 1.  (Superquantile  as  optimal  solution)  For  Y  G  C2  and  a  G  (0, 1)  , 

qa(Y)  =  argmin{c0  +  Va(Y  —  c0)} 

coG-R 

=  argmin£a(y  —  c0).  (11.17) 

c0eR 

Proof. 

Let  <p(c)  =  c  +  Va(Y  —  c )  and  ipp(c)  =  max{0,  qp(Y)  —  c}.  These  are  both  convex 
functions  of  c,  and  i[)p  is  nonincreasing.  We  can  use  the  criterion  that 

c  G  argrnin <p(c)  <//+(c)  >  0,  <//_(c)  <  0, 

C 

where,  because  of  the  monotonicity  of  ipp, 

F+{c)  =  l  +  (' ipp)'_(c)d(3 ,  <p'_(c)  =  1  +  — [  (il>p)'+(c)dp, 

f  -1  if  qp(Y)  >c,  ,  /  -1  if  qp(Y)  >  c, 

(  0  if  qp(Y)  <  c,  [  0  if  qp(Y )  <  c. 

Therefore 

[  (^0 )+(cW  =  [  (M-{c)dp 

Jo  Jo 

=  -(1-7)  for  c  =  q1{Y), 

in  which  case  {$p)'{c)  =  {ipp)+(c)  =  (ppp)'_(c)  =  -a).  Thus,  (ipp)'(c)  =  0 

corresponds  to  c  =  g7(Y)  for  7  =  a.  Consequently,  the  first  equality  of  the  theorem 
holds.  The  second  follows  directly  from  (11.16)  and  the  fact  that  a  constant  in  an 
objective  function  is  immaterial  with  regard  to  the  argmin. 

□ 

The  foundations  for  quantile  regression  are  given  by  equations  (II. 6)  and  (II. 8). 
Analogously,  the  expressions  in  (11.17)  provide  the  path  to  superquantile  regression 
as  developed  in  Section  II. B.  In  fact,  Theorem  II.  1  shows  that  qa(Y )  is  the  uniquely 
“closest”  scalar  to  Y  in  the  sense  of  the  superquantile  error.  The  optimal  value  in 
(11.17)  defines  a  measure  of  risk  (see  Rockafellar  &  Royset,  2014b) 

77a(F)  :=  min{c0  +  Va(Y  -  c0)} 

coG-R 

=  qa(Y)  +  Ya(Y-qa(Y)) 


26 


for  Y  G  C2  analogously  to  qa(Y )  in  (II. 7).  A  corresponding  measure  of  deviation  is 
given  by 

t>a(Y)  :=  min  Sa(Y  —  c0) 

coE-R 

=  na(Y)-E[Y].  (11.18) 

We  note  that  parallel  to  (II. 2)  (see  Rockafellar  &  Royset,  2014b), 

na{Y)  =  -^—  f'qpWdP 

I  ®  J a 

and,  consequently, 

Va(Y)  =  -^—  [  qp(Y)d/3  —  E[Y], 

t  ^  J  a 

The  measures  of  regret,  error,  risk,  and  deviation,  Vaj  £a,  TZa,  and  Va,  respectively, 
for  a  probability  level  a  G  (0, 1),  form  a  family  of  risk  quadrangles,  in  the  sense  of 
Rockafellar  and  Uryasev  (2013),  that  corresponds  to  the  superquantile  as  the  statistic, 
as  shown  below. 


Superquantile-based  Quadrangle:  (at  any  probability  level  a  G  (0, 1)) 

Error  measure  =  £a(Y)  =  f'  max  {0,  qp(Y)}  d/3  —  E[Y] 

Risk  measure  =  7 Za(Y)  =  qa(Y )  =  a-second-order  superquantile 
Deviation  measure  =  Va(Y)  =  qa{Y )  —  E[Y] 

Regret  measure  =  Va(Y)  =  J2  max  {0,  qp(Y)}  d/3 
Statistic  =  S(Y)  =  qa(Y)  =  u-superquantile 


We  note  here  that  the  measure  of  deviation  T>a  plays  a  central  role  in  the 
remainder  of  the  dissertation  as  it  facilitates  simplifications,  goodness  of  fit  tests,  and 
computational  methods. 
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B.  SUPERQUANTILE  REGRESSION 

Theorem  II.  1  and  the  development  leading  to  quantile  regression  direct  us 
to  a  new  regression  methodology  that  is  centered  on  a  superquantile  error.  The 
next  subsection  poses  the  regression  problem,  provides  its  properties,  and  discusses 
stability  under  perturbations.  The  section  ends  with  a  discussion  of  superquantile 
tracking. 

1.  Superquantile  Regression  Problem 

While  Theorem  II.  1  shows  that  the  “best”  scalar  approximation  of  a  random 
variable  Y  in  the  sense  of  a  superquantile  error  is  the  corresponding  superquantile, 
we  now  go  beyond  the  class  of  constant  functions  to  utilize  the  connection  with  an 
underlying  explanatory  random  vector  X.  We  focus  on  regression  functions  of  the 
form 

f(x )  =  c0  +  (c,  h(x)),  c0  G  iR,  c  G  Mm, 

for  a  given  “basis”  function  h  :  ]Rn  — >■  lRr" .  This  class  satisfies  most  practical  needs 
including  that  of  linear  regression  where  m  =  n  and  h(x)  =  x.  Extensions  beyond 
this  class  are  also  possible  but  not  dealt  with  in  this  dissertation. 

We  now  define  the  Superquantile  Regression  Problem  SqR,  for  any  h  :  lRn  — >■ 
Mm  and  a  G  (0, 1),  where 

Z(c0,c)  :=  Y  —  (c0  +  (c,  h(X))) 

is  the  error  random  variable,  whose  distribution  depends  on  Cq,  c,  h,  and  the  joint 
distribution  of  (A",  Y).  We  denote  by  C  C  Mm+1  the  set  of  optimal  solutions  of  SqR 
and  refer  to  (cq,  c)  G  C  as  a  regression  vector. 


Superquantile  Regression  Problem: 

SqR  :  min  £a  (Z(c0,c))  =  -^ —  [  max{0,  qp(Z(c0,  c))}  d/3^  E[Z(co,  c)]. 

c0eR,c£Rm  L  —  a  J o 


The  objective  function  Sa(Z is  well-defined  and  finite  when  the  distri¬ 
bution  of  (X,  Y)  and  h  is  such  that  Z(co,c)  G  C2  for  all  cq  G  JR,  and  c  G  JR1" .  A 
sufficient  condition  that  ensures  this  property  is  that  Y,  h\(X), ...,  hm( X)  G  C2.  We 
adopt  the  notation 


H  =  h( X),  Hi  =  hi(X),  i  —  1,2, 

Lemma  II. 2.  IfY,  Hi, ...,  Hrn  G  C2,  then  Z(c0,  c )  G  £2  for  all  c0  G  JR,  and  c  G  JRm . 

In  surrogate  estimation,  c0  +  (c,h(X)),  with  (c0,c)  G  C,  provides  the  best 
approximation  of  Y  in  the  sense  of  a  superquantile  error.  For  example,  after  having 
computed  (c0,c),  the  analysis  could  proceed  with  examining  the  moments,  quantiles, 
and  superquantiles  of  c0  +  (c,  h(X))  as  surrogates  for  the  corresponding  quantities  of 
Y.  If  X  is  Gaussian  and  h  is  affine,  then  Co  +  (c,  h( X))  is  a  Gaussian  approximation 
of  Y  easily  examined  and  utilized  in  further  studies.  It  may  also  be  of  interest  to 
examine  c0  +  (c,  h( X))  under  hypothetical  distributions  of  X. 

As  a  direct  consequence  of  the  Regression  Theorem  in  Rockafellar  and  Uryasev 
(2013)  (see  also  Theorem  3.1  in  Rockafellar  et  ah,  2008),  we  obtain  that  a  regression 
vector  can  equivalently  be  determined  from  a  measure  of  deviation  T>a . 

Proposition  II. 3.  Suppose  that  Y,H\, Hrn  G  C2 .  Then,  the  set  of  regression 
vectors  C  of  SqR  is  equivalently  obtained  as 

C  =  { (cb,  c)  G  JRm+1  |  c  G  argmin  Va(Z0(c)),  c0  =  qa(Z0(c))  \  , 
l  c&Rm  J 

where  Z0(c )  :=  Y  —  (c,  h{X)). 

Proposition  II. 3  implies  computational  advantages  as  the  (m  +  l)-dimensional 
optimization  problem  SqR  is  replaced  by  a  problem  in  m  dimensions  with  a  simpler 
objective  function,  which  we  fully  utilize  in  Chapters  III  and  IV.  Moreover,  the  result 
also  proves  to  be  beneficial  in  analysis  of  regression  vectors. 
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We  now  define  the  Deviation-based  Superquantile  Regression  Problem  DSqR , 
for  any  h  :  JRn  — >  JRm  and  a  G  (0, 1): 

Deviation-based  Superquantile  Regression  Problem: 

DSqR  :  min  Va  (Z0(c))  =  — f  qp{ZQ{c))dP  -  E[Z0(c )], 

cei?m  1  -  cl  Ja 

with  c0  being  obtained  by  setting  c0  =  qa(Z0(c)). 

The  existence  of  a  regression  vector  is  ensured  by  the  next  result,  which  also 
provides  conditions  for  uniqueness. 

Theorem  II. 2.  (Existence  and  uniqueness  of  regression  vector)  IfY ,  H i, Hm  G  C2, 
then  SqR  is  a  convex  problem  with  a  set  of  optimal  solutions  C  that  is  nonempty, 
closed,  and  convex. 

(a)  C  is  bounded  if  and  only  if  the  random  vector  X  and  the  basis  function  h  satisfy 
the  condition  that  (c,  h( X)}  is  not  constant  unless  c  =  0. 

(b)  If  in  addition,  for  every  (co,  c),  (d0,d)  G  JR,n+1,  with  c  ^  d ,  there  exists  a  (3q  G 
[0, 1)  such  that 

0  <  qp{Z{cQ,  c)  +  Z(c'0,  d))  <  qp{Z(co,  c))  +  qp{Z(d0,  d))  (11.19) 

for  all  /3  G  [/?o,  1),  then  C  is  a  singleton. 

Proof. 

Since  Y  G  C2  implies  that  Sa(Y)  <  oo,  by  Lemma  II. 1,  we  deduce  the  two  first 
conclusions  from  Theorem  3.1  in  Rockafellar  et  al.  (2008).  Hence,  we  only  need  to 
show  that  C  is  a  singleton. 

Suppose  for  the  sake  of  a  contradiction  that  (c0,c),  (d0,d)  G  C  and  (c0,c)  ^ 
(d0,d),  with  corresponding  optimal  value  f  >  0,  i.e.,  f  =  Sa(Z(c0,c ))  =  £a(Z(d0,d)). 
We  consider  two  cases. 

First,  suppose  that  £  =  0.  By  Proposition  II. 2,  Z(co,c)  =  Z(d0,d )  =  0  and 
consequently 

Co  +  (c,  H)  =  Cq  +  (d ,  H ), 


30 


which  implies  that  (c  —  c',  H)  =  Cq  —  c0.  Under  the  assumption  that  (c,  h(X))  is  only 
constant  when  c  =  0,  we  must  have  that  c  —  d  =  0.  Then,  also  c'0  —  Co  =  0  follows, 
which  contradicts  the  hypothesis  that  (co,c)  ^  (c'0,  cr). 

Second,  suppose  that  £  >  0.  If  c  =  c',  then  a  direct  consequence  of  Propo¬ 
sition  II. 3  and  the  fact  that  every  random  variable  has  a  unique  superquantile  at 
each  probability  level,  is  that  also  cq  =  c'0,  which  again  contradicts  our  hypothesis. 
Consequently,  we  focus  on  the  case  with  c  ^  c',  for  which  there  exists  a  (30  such  that 
(11.19)  holds  for  all  /3  G  [/d0,  1)-  Trivially,  then 

max{0,  qp{Z(co,  c )  +  Z(c'0,  c'))}  <  max{0,  qp{Z(c0,  c))}  +  max{0,  qp{Z(c^,  c'))} 

for  f3  G  [/do,  1)-  If  /d  G  (0, 1)  is  such  that  qp(Z(co,  c )  +  Z(c'0,  c '))  <  0,  then 

max{0,  qp(Z (c0,  c)  +  Z(c'0,  c'))}  <  max{0,  c))}  +  max{0,  q^(Z(c'0,  c'))} 

as  the  left-hand  side  vanishes  and  the  right-hand  side  is  nonnegative.  Hence, 

i 

max{0,  qp{Z{co,  c )  +  Z(c'0,  c'))}d/3 

<  /  max{0,  qp(Z(c0,  c))}d/3  +  /  max{0,  qp(Z(dQ,  c'))}d/3 
Jo  Jo 

and  also 

£a(Z(co,c)  +  Z(c'0,c'))  <  £a(Z(co,  c))  +  £q(Z(cq,  c')).  (11.20) 

Let 

(c",c")  =  (l/2)(c0,c)  +  (l/2)(c',c') 

and  therefore 

2Z(c",c")  =  Z(c0,c)  +  Z(c'0,c'). 

By  the  optimality  of  £,  the  positive  homogeneity  of  Sa,  and  (11.20),  we  find  that 

2^  <2 £a(Z(c",c")), 
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and  that 


2£a{Z (cq,  c"))  =  4(2 Z(c/',c")) 

<  £a(Z(c0,c))  +£a(Z(c'0,c')). 

Since 

£a{Z (c0,  c))  +  £a(Z (c'q,  c'))  =  2£, 

we  finally  get  that 

2£  <  2f , 

which  cannot  hold.  In  view  of  this  contradiction,  the  conclusion  follows. 

□ 

While  Theorem  II. 2  gives  a  sufficient  condition  for  uniqueness  of  the  regression 
vector,  in  general  uniqueness  cannot  be  expected.  For  example,  suppose  that  the 
random  vector  (X,  Y),  with  X  scalar  valued,  has  the  possible  and  equally  likely 
realizations  (1, 1),  (2,  2),  and  (3, 1).  Then,  qp(Z0(c))  =  max{l  —  c,  2  —  2c,  1  —  3c}  for 
(3  >  2/3  and  E[Z0(c)\  =4/3  —  2 c.  It  is  straightforward  to  show  that  for  a  >  2/3, 
any  c  G  [—1, 1]  minimizes  Va(Z0(-)).  Consequently,  in  view  of  Proposition  II. 3,  any 
c  G  [—1, 1],  with  a  corresponding  c0  =  max{l  —  c,  2  —  2c,  1  —  3c},  minimizes  £a(Z(- ,  •)) 
for  a  >  2/3,  as  shown  in  Figure  5.  The  minimum  error  is  2/3. 

A  unique  regression  vector  is  indeed  achieved  in  the  normal  case  as  stated 

next. 

Proposition  II. 4.  Suppose  that  ( H ,  Y)  is  normally  distributed  with  positive  definite 
variance-covariance  matrix.  Then,  C  is  a  singleton. 

Proof. 

Let  E  be  the  variance-covariance  matrix  of  ( H,Y ),  with  Cholesky  decomposition 
E  =  LLT .  For  any  f3  G  (0, 1)  and  c  G  !Rm ,  Z0(c)  is  also  normal  with  mean  E[Zq{c)]  = 
(c,  E[(H,  y)])  and  variance  a2(Z0(c ))  =  (c,  Ec),  where  c  =  (— c,  1).  Thus, 

qp(Z0(c))  =  E[Z0{c)\  +  kpa(Z0(c))  =  E[Z0(c)\  +  kg\\LT  c\\, 
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0  12  3  4 


Explanatory  Variable  x 


Figure  5.  Example  of  multiple  optimal  solutions  for  problem  SqR. 


where  kp  =  </>($^1(/3))/(  1  —  (3),  with  </>  and  $  being  the  standard  normal  probability 
density  and  cumulative  distribution  functions,  respectively. 

For  c,  d  G  Mm,  with  c  ^  d,  there  is  no  constant  k  >  0  such  that  (— c,  1)  = 
k(—d,  1).  Let  c  =  (— c,  1)  and  d  =  (—d,  1).  Since  E  is  positive  definite,  the  upper- 
triangular  matrix  LT  is  unique  and  full  rank.  Consequently,  the  null  space  of  LT 
contains  only  the  zero  vector  and  LT(c  —  kd )  ^  0  for  all  scalars  k  >  0.  Since  the 
triangle  inequality  for  two  vectors  holds  strictly  whenever  the  two  vectors  cannot  be 
expressed  as  a  positive  multiple  of  each  other,  we  therefore  find  that 

||Ltc  +  Ltc'||  <  \\Lt5\\  +  ||LTc'||. 


Now  for  the  sake  of  a  contradiction  suppose  that  c,  d  G  JRm  both  minimize 
T>a(Zo(-))  and  attain  the  minimum  value  £  G  JR,  but  c  7^  d .  Let 


d'  =  (l/2)c+  (l/2)c/,  d'  —  (—d',  1),  and  =  /  kpdf3/(l  —  a)  >  0. 
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Then, 

Va(Z0(c"))  =  — f1  qp(z0(c"))df3  -  E[Z0(c")] 

1  ^  J a 

=  E[Z0(c :")]  +  la\\LTc"\\  -  E[Z0(c :")] 

=  ^||Ltc  +  Ltc'|| 

<  y(l|ITc||  +  ||£T?||), 

and  since 

^(||Ltc||  +  ||Ltc'||)  =  1-(yE[Z0(c)\  +  ^Q\\LTc\\-E[Z0(c)])  + 

+  1  (^[Z0(c')]  +  laWL^c'W  -  E[Z,{c')]) 

=  l(Va(Z0(c)j)  +  ±(pa(Z0(<!)j) 

=  e, 

we  have  that  Va(Z0(c"))  <  £.  However,  this  contradicts  the  optimality  of  c,  d  and  we 
reach  the  conclusion. 

□ 

We  next  turn  to  consistency  and  stability  of  the  regression  vector.  Of  course, 
the  joint  distribution  of  (A",  Y)  is  rarely  available  in  practice  and  one  may  need  to 
pass  to  an  approximating  empirical  distribution  generated  by  a  sample.  Moreover, 
perturbations  of  the  “true”  distribution  of  (A,  Y)  may  occur  due  to  measurement 
errors  in  the  data  and  other  factors.  We  consider  these  possibilities  and  let  (Xu,  Yv)  be 
a  random  vector  whose  joint  distribution  approximates  that  of  (A,  Y)  in  some  sense. 
For  example,  (. A u,  Yu)  may  be  governed  by  the  empirical  distribution  generated  by  an 
independent  and  identically  distributed  sample  of  size  v  from  (A ,Y).  Presumably, 
as  v  — >■  oo,  the  approximation  of  (A,  Y)  by  (A",  Yv )  improves  as  stated  formally 
below.  Regardless  of  the  nature  of  ( Xv ,  Y"),  we  define  the  approximate  error  random 
variable  as 

Zv{c0,c)  :=  Yv -co- {c,h{Xv)), 
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and  the  corresponding  Approximate  Superquantile  Regression  Problem  SqRv  as  fol¬ 
lows: 


Approximate  Superquantile  Regression  Problem: 

SqRu  :  min  £a  (Zu(c0,  c))  =  — - —  [  max{0,  qn(Zv(co,  c))}  df3  — 

c0ei?,cei?m  1  —  a  J0 

—E[Zu(c0,  c)]. 


The  next  result  shows  that  as  (Xu,  Yv)  approximates  (. X ,  Y),  a  regression  vec¬ 
tor  obtained  from  SqRv  approximates  one  from  SqR,  which  provides  the  justification 
for  basing  a  regression  analysis  on  SqRv .  Below,  we  let  — >d  denote  convergence  in 
distribution  and 

Hv  =  h{Xv ),  H\  =  hi{ Xu),  i  =  1,2, ...,  m. 

Theorem  II. 3.  (Stability  of  regression  vector)  Suppose  that  (. Xl’,Y1' ),  u  =  1,2,..., 
and  (X,Y)  are  n  +  1-dimensional  random  vectors  such  that  (X Y,YU)  — >d  (X,Y) 
and  that  the  basis  function  h  is  continuous  except  possibly  on  a  subset  S  C  Mn  with 
P{ X  G  S}  =  0.  Moreover,  let  H, ,  Y  e  C2,  sup uE[{Hf)2]  <  oo,  i  =  1,2,  and 

supu  E[(YU)2}  <  oo. 

If  {(cq,  cu)}(f=1  is  a  sequence  of  optimal  solutions  of  SqRu,  with  a  G  (0,1), 
then  every  accumulation  point  of  that  sequence  is  a  regression  vector  of  SqR. 

Proof. 

Let  (co,  c)  G  lRm+1  be  arbitrary.  By  the  continuous  mapping  theorem  (see  for  example 
Theorem  29.2  in  Billingsley,  1995), 

Zv(co,  c )  =  Yv  -  c0  -  (c,  h{Xv))  -A d  Z  (c0,  c)  =  Y -co-  (c,  h(X)). 

By  the  assumed  moment  conditions  in  (II.  1),  there  exists  a  constant  M  <  oo  that 
bounds  from  above  the  terms 

maxE[\Hi\\,  ma  xE[(H.f)2},  sup  E[\H?  |],  sup  E[(Hf)2], 

1  1  v,i  v,i 

and  E[\Y\],  E[Y2},  sup£[|T,,|],  sup E[(YU)2}. 
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In  view  of  Lemma  II. 2  and  its  proof,  we  deduce  that 


E[(YV  -  c0  -  (c,  Hv))2}  <M  +  2(\\c\\m1/2M  +  (M  +  \c0\)\\c\\mM)  +  \\c\\2mM  (11.21) 
for  all  v.  Hence,  Zu{cq,c)  is  uniformly  integrable  (for  fixed  co,c)  and 

E[Zv(co,c)]  E[Z (c0,  c)]  <  oo;  (11.22) 


see  Billingsley  (1995),  Theorem  25.12  and  its  corollary. 

By  Theorem  4  in  Rockafellar  and  Royset  (2014b),  a  sequence  of  random  vari¬ 
ables  converges  in  distribution  to  a  random  variable  if  and  only  if  the  corresponding 
a-superquantiles,  viewed  as  functions  of  the  probability  level  a,  converge  uniformly 
on  every  closed  subset  of  (0, 1).  Consequently,  qp(Zu(co,  c))  —>■  qp(Z(c o,  c))  uniformly 
in  P  on  closed  subsets  of  (0,1).  Moreover,  since  the  O-superquantile  coincides  with 
the  expectation,  (11.22)  implies  that  q0(Zu(c0,  c))  — »  q0(Z(c0,c ))  also  holds.  These 
facts  and  the  observation  that  the  superquantile  of  any  random  variable  is  continuous 
and  nondecreasing  as  a  function  of  the  probability  level,  ensure  that  for  any  e  >  0 
and  5  G  (0, 1),  there  exists  an  integer  i/(e,  5)  such  that  for  all  v  >  z/(e,  5), 

sup  \qp{Zv{cQ,c))  -qp(Z(co,c))\  <  6  .  (11.23) 

/3e[o,i-5]  ~  d) 


Then, 


max  <  0,  qp(Zu(co,  c 


-1—5 


max  <  0,  qp(Z(coi  c))  \d/3 


< 


< 


»l-5 


"1-5 


Qp(Zv(co,c))  —  qp(Z (c0,  c)) 


dp 


dp 


I  o  2(1-5) 


e 

<  - 
“  2 


(11.24) 


for  all  v  >  z/(e,  5).  Following  an  argument  similar  to  that  in  Lemma  II.  1,  we  find  that 


'1-5 


max{0 ,qp{Z{cQ,c))}dP  <  51/2u(Z(c0,  c)) 


max  ( 0,  6E[Z(c0 ,  c)]51/2u(Z(c0,  c))|.  (11.25) 
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Moreover,  the  reasoning  that  leads  to  (11.21)  also  gives 


E[Z(cq,c)]  <  M +\c0\  +  \\c\\mM. 


(11.26) 


These  facts  show  that  there  exists  a  positive  constant  M  <  oo  (which  depends  on  Co 
and  c)  such  that  \E[Z(cq,  c)]|,  a(Z(co,  c))  <  M.  Hence,  from  (11.25),  we  find  that 

-1  ^  _  . 

(11.27) 


'1-5 


max  <  0,  qp(Z(c0,  c))  \d/3  <  3 Md1^2 


Let  e  <  12 M  and  5e  =  (e/(12M))2.  Then,  3 MdXJ2  =  e/4  and 


'1—5, 


max  <  0,  qp[Z (c0,  c 


e 

< 

“  4 


(11.28) 


An  identical  result  holds  for  Zu(cq,  c ).  Let  qp/Zu{co,  c))+  =  max{0,  qp/Zu(co,  c))}  and 
qp(Z(c0,  c))+  =  max{0,  qp(Z(c0,  c))}.  Consequently,  for  all  u  >  is(e,5e), 

^  qp(Zu(c0}c))  d/3  -  [  qp(Z(c0,c))  d/3 


< 


[•  1— 5e  /*1  — 5e 

/  g/3 (Zv(co,c))+dP  -  /  qp(Z(c0,  c))+d/3 

'o  7o 

+  f  qp(Zu(c0,c))  d/3  +  [  qp(Zu(c0,c))  d/3 


'1-5, 


'1-5, 


e  e  e 
- 2+4+4 
<  e. 


The  fact  that  E[Zu/cq,  c)]  — *  A[Z(co,c)]  <  cxd  and  the  assumption  that  (co,c)  is 
arbitrary,  imply  that  £a(Zl'(-,  •))  — >■  £/,(Z(-,  •))  pointwise  on  iRm+1 .  Lemma  11.1  and 
the  above  moment  assumptions  imply  that  £a(Zu(-,  •))  and  £a(Z(-,  •))  are  finite- valued 
functions.  They  are  also  convex,  which  follows  directly  from  the  convexity  of  £a  on  C2 
and  the  affine  form  of  Zu  and  Z  as  functions  of  cq  and  c.  Consequently,  by  Theorem 
7.17  in  Rockafellar  and  Wets  (1998),  £a(Zu£,  •))  epiconverges  to  £a(Z(-,-)).  The 
result  then  follows  from  Theorem  7.31  in  Rockafellar  and  Wets  (1998). 

□ 
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When  the  Approximate  Superquantile  Regression  Problem  SqR1'  is  constructed 
using  an  independent  identically  distributed  sample  of  size  v  from  the  distribution 
of  (X,Y),  we  obtain  the  following  corollary  which  follows  from  the  properties  of  the 
empirical  distribution. 

Corollary  II. 1.  Suppose  that  the  basis  function  h  is  continuous  except  possibly  on  a 
subset  S  C  JRn  with  P{X  e  S'}  =  0  and  that  H, ,  Y  6  C? ,  i  =  1,2,  ...,m.  Moreover, 
let  (Xu,  Yv)  be  distributed  according  to  the  empirical  distribution  generated  by  an  in¬ 
dependent  and  identically  distributed  sample  of  size  v  from  the  distribution  of  (X,Y). 
Then,  the  conclusion  of  Theorem  II.  3  holds. 

We  next  examine  the  rate  of  convergence  of  regression  vectors  obtained  from 
the  approximate  problem  SqRu  to  those  of  SqR  corresponding  to  the  “true”  distribu¬ 
tion.  It  appears  difficult  to  obtain  asymptotic  distribution  theory  for  superquantile 
regression  without  additional  assumptions,  which  among  other  consequences  should 
ensure  unique  optimal  solutions  of  SqR.  We  prefer  another  route  that  leads  to  a  rate 
of  convergence  result  under  mild  assumptions. 

Quantification  of  the  stability  of  the  set  of  optimal  solutions  of  an  optimiza¬ 
tion  problem  under  perturbations  depends  on  a  “growth  condition”  of  the  problem, 
which  is  difficult  to  quantify  for  SqR ;  see  Section  7J  in  Rockafellar  and  Wets  (1998). 
Consequently,  we  focus  on  the  better  behaved  e-regression  vectors  of  SqR  defined  for 
e  >  0  as 

|  (c0,e,  Ce)  G  lRm+1 

with  an  analogous  definition  of  the  e-regression  vectors  of  SqRv  denoted  by  Cf.  The 
rate  with  which  Cf  tends  to  Ce  depends,  naturally,  on  the  rate  with  which  (A X,YU), 
underlying  SqRu ,  tends  to  (X,  Y)  of  SqR  in  some  sense.  Before  we  make  a  precise 
statement,  we  introduce  a  convenient  notion  of  distances  between  any  two  nonempty 
sets  A,  B  C  dRm+1 .  For  p  >  0,  let 

dIp(A,  B )  :=  inf {77  >  0|A  D  pIB  C  B  +  pIB,  B  D  pIB  C  A  +  r/IB}, 


Sa(Z(c0}€,ce))  <  min 

coS-R.ce-R" 


£a{Z (cq,  c))  + 
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where  IB  is  the  Euclidean  ball  in  Mm+ 1  with  unit  radius  and  center  at  the  origin. 
Roughly,  dIp(A,B)  is  the  smallest  amount  the  sets  need  to  be  “enlarged”  to  ensure 
they  contain  the  other  one,  with  an  exclusive  focus  on  points  no  further  from  the 
origin  than  p.  This  restriction  facilitates  the  treatment  of  unbounded  sets. 

As  we  see  next,  the  rate  of  convergence  is  directly  related  to  the  rate  with 
which  the  random  vector 

A1"  :=  (Hv  -  H,YV  -Y), 

describing  the  approximation  error,  tends  to  zero. 

Theorem  II. 4.  (Rate  of  convergence  of  regression  vector)  Suppose  that  (XU,YU), 
v  —  1,2, ...,  arid  (X,Y)  are  n  +  1-dimensional  random  vectors  generating  SqRv  and 
SqR,  respectively.  Moreover,  let  H, ,  Y  e  C2 ,  sup VE[(H?)2]  <  oo,  i  =  1,2 and 
sup;y  E[(YU)2]  <  oo.  Let  p0  >  0  be  such  that  p0!B  n  C  ^  0  and  p0IB  D  Cv  ^  0. 

Then,  for  p  >  p0,  there  exist  positive  constants  k\ ,  k-2,  and  k%  (dependent  on 
p)  such  that  for  any  e  >  0  and  v  —  1,2, ..., 

d„(c:,c,)<  (1+^)  S[||A1]  max  jo.log  (  ^  )  |  +  kfj  +  fc3||E[A-]|| 

whenever  i?[||AI'||]  >  0  and  dIp(Cf,Ce )  =  0  otherwise. 

Proof. 

By  Theorem  3(a)  of  Rockafellar  and  Royset  (2014b),  for  f3  G  [0, 1), 

Qp(Zv(co,c))  -  qy(Z(c0,c))  <  ^-^E[\Zv(co,  c)'«-  Z(c0,  c)|], 

and  since 

tYeIIZ"(co,c)-Z(c0,c)B  =  j-Eeikc, 

we  get  that 

qp{Zv{co,c))-qp{Z{c0,c))  <  j^-^E[\{c,  A")\] 

<  r^g||c||£?[||A*'||],  (11.29) 
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where  c  =  (— c,  1).  Then,  for  5  G  (0, 1), 


pi— 8  pi— 8 

/  max{0,  qp(Zu(c0,  c))}df3  —  /  max{0,  qp(Z(c0,  c))}d(3 
Jo  Jo 

<[  \qp(Zv(cQ,c))^qp(Z(co,c))\d0 
Jo 

<  ||c||£[||Al]  jf‘  ‘-J^d/3 

<-||c||f:[||A-||]logi.  (11.30) 

Let  p  >  po  and  M  be  an  upper  bound  on  first  and  second  moments  of  H,  | ,  H”  | ,  |Y|, 
and  \YV\  as  in  the  proof  of  Theorem  II. 3.  Since  |(c,  H)\  <  ||c||  Y1T=\  1-^*1  and  (c,  L7)2  < 
||c||2E:=i(^)2,  we  find  that  i?[|(c,  iL)|]  <  ||c||mM  and  E[(c,  H)'2]  <  ||c||2mM.  Con¬ 
sequently, 

E[(Y-c0  -  (c,  H)f]  <  E[{Y  —  c0)2]  +  2\E[(Y  —  c0)(c,  H)\ |  +  E[{c,  H)'2] 

<  M  +  2(\\c\\m1/2  M  +  (M  +  |c0|)||c||mM)  +  ||c||2mM. 

(11.31) 


Then,  for  ||(co,c)||  <  p,  it  follows  by  (11.26)  that 

\E[Z(cq,  c)]  |  <  M  +  p  +  pmM 

and  by  (11.31)  that 

/  \  1/2 
a(Z(c0,  c))  <  (M  +  2  ( pmJ'2M  +  (M  +  p)pmM )  +  p2mM\  , 

with  identical  bounds  for  |£,[Zz/(c0,  c)]  and  o‘(Z*/(c0,  c)).  Let  Mp  be  the  larger  of  the 
two  previous  right-hand  sides. 

By  (11.25),  analogously  to  (11.27),  we  have  that  for  ||(co,c)||  <  p, 

I  max{0,  qp(Z(co,  c))}d/3  <  ZMpS1^2  (11.32) 

J 1—5 

and  similarly  with  Z(cq,c)  replaced  by  Zv(cq,c). 
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We  also  find  that  for  ||(c0,c)||  <  p, 


E[Zv(co,  c)]  —  E[Z (c0,  c)] 


(c,E[^]) 

<  l|S||||S[A1|| 

<  (1+P)||£[A 


(11.33) 


Then,  collecting  the  results  of  (11.30),  (11.32),  and  (11.33),  and  for  ||(c0,  c)||  <  p, 
we  obtain 

£a(Zu(c0,  c ))  -  £a(Z(c0,  c )) 


< 


< 


nrax{0,  q/s(ZI/(c0,  c))}d/3  —  /  nrax{0,  qp(Z(c0,  c))}d/3 


E[Zu(c0,c)\-  E[Z{c0,c)} 

I 

r»l  —  S  /*1  — S 

max{0,  qp(Zu(co,  c))}d/3  —  /  max{0,  qp(Z(co,  c))}d(3 


"1  f1 

max{0,  qp(Zu(cQ,  c))}dfl  +  /  max{0,  qp(Z(cQ,  c))}d/3 


'  1—5 


'  1— (5 


+  E[Zv(co,c)]-E[Z(co,c)] 

<  -(1  +  p)E[\\Au\\\  logS  +  6M/V2  +  (1  +  p)||£[A*']||. 


(11.34) 


We  next  determine  the  choice  of  <5  G  (0, 1)  that  minimizes  the  previous  bound  and 
consider  two  cases.  First,  if 


0  <  kp  (£[||  A"||])2  <  1, 


with 


kp  :  = 


2(1  +  P) 

6M„ 


then  differentiation  gives  that  the  bound  is  minimized  with  S  =  /cp(if[||  A^l])2.  Second, 

if 


then 


M£[||A1])2  >  1, 

4(1  +  p)E[\\A1' 


Mp  < 
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and  the  bound 


-  (1  +  p)£[||  Al]  log ,5  +  6 Mf5'V  +  (1  +  P)||B[A-]|| 

<  -(1  +  p)B[||A-||] log*  +  4(1  +  ri£[||A"||],5'/2  +  (1  +  p)||£[A"]||, 

for  any  5  G  (0,1).  Consequently,  combining  the  two  cases,  there  exist  constants  k\ , 
h'2 ,  and  (which  depend  on  p),  such  that  for  ||(co,c)||  <  p, 

£a(Zu(c0,  c))  -  £a(Z(c0,  c)) 

<  /u£[||AI'||]  max  jo,  log  }  +  k*E[ II  Al]  +  h\\E[Au}\\ 

<£[IIA1]  (ki  max  |o,  log  (^||^||])|  +  k*j  +  M£[A1II- 

Direct  application  of  Example  7.62  and  Theorem  7.69  of  Rockafellar  and  Wets  (1998) 
then  yields  the  conclusion  for  i7[||Ai'||]  >  0,  where  the  additional  coefficient  (l  +  4p/e) 
originates  in  that  theorem.  Finally,  if  ,E[||AI'||]  =  0,  then,  in  view  of  (11.29)  and  the 
fact  that  this  implies  that  ||E,[AI/]||  =  0,  we  find  that  for  ||(co,c)||  <  p, 

\£a(Zu(c0,c))  -  £a(Z(c0,c))\  =0. 

The  final  conclusion  then  follows  by  again  invoking  Example  7.62  and  Theorem  7.69 
of  Rockafellar  and  Wets  (1998). 

□ 

Theorem  II. 4  shows  that  the  distance  between  C/  and  Ce  is  almost  proportional 
to  TT[||  A^ || ] ,  but  with  a  minor  correction  by  a  logarithmic  term.  If  the  approximation 
(XV,YV)  is  caused  by  measurement  errors  of  magnitude  1/u,  i.e.,  the  absolute  value 
of  each  component  of  (. Xu  —  X,YU  —  Y)  is  no  greater  than  1/u  almost  surely,  then 
E[||AI/||]  <  \Jm  +  1/u  and  the  expressions  can  be  simplified.  For  £  >  0,  logo:  <  x ^ 
for  sufficiently  large  x  G  M.  Consequently,  for  any  £  G  (0, 1)  and  sufficiently  large  u, 

<#,«?,£)  <(1  +  7)  -E, 
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where  k  >  0  can  be  determined  from  k\ ,  k2,  k:i ,  and  m.  That  is,  the  Euclidean 
distance  between  an  e-regression  vector  of  SqRu  to  one  of  SqR  is  0(z^_1)  for  £  G  (0, 1) 
arbitrarily  close  to  zero. 

2.  Superquantile  Tracking 

We  next  turn  to  the  situation  where  the  primary  concern  is  with  the  conditional 
loss  Y(x)  given  that  the  explanatory  random  variable  takes  on  specific  values,  X  =  x. 
We  seek  to  estimate  qa(Y(x))  for  x  G  Mn,  or  a  subset  thereof,  with  the  goal  of 
eventually  minimizing,  at  least  approximately,  qa(Y(x))  by  a  judicious  choice  of  x. 
Of  course,  with  incomplete  knowledge  about  the  distributions  of  Y(x)  this  is  a  difficult 
task  that  can  be  achieved  only  approximately.  For  example,  there  is  no  guarantee 
that  a  regression  function  /  =  c0  +  (c,  /&(•)),  with  (c0,  c)  G  C  obtained  by  solving  SqR 
using  a  G  (0,1),  tracks  qa(Y(x)),  i.e.,  f(x)  =  qa(Y(x))  for  all  x  G  Mn.  The  hope 
of  such  “exact”  superquantile  tracking  becomes  even  less  realistic  when  SqR  must 
be  replaced  by  an  approximation  SqRu  as  typically  required  in  practice.  However, 
“local”  superquantile  tracking  is  possible,  at  least  approximately,  as  stated  in  the 
next  proposition.  Moreover,  tracking  is  achieved  under  certain  model  assumptions. 
For  example,  if  we  have  that  Y  =  cq  +  (c,  X)  +e,  for  some  cq  G  M,  c  G  JRn ,  and  where 
e  is  independent  of  X,  then  superquantile  tracking  is  guaranteed;  see  Theorem  5.1  in 
Rockafellar  and  Royset  (2014a). 

Here  we  consider  the  situation  where  there  is  a  sample  of  Y(x)  for  some  values 
of  x,  but  this  sample  is  not  large  enough  to  allow  pointwise  estimation  of  qa(Y(x)) 
for  every  x  of  interest.  There  may  even  be  no  x  for  which  there  are  multiple  sample 
points  of  Y(x).  Concentrating  on  a  particular  x  G  JRn,  we  hope  to  estimate  qa(Y(x)) 
by  using  samples  from  Y(x)  for  x  near  x,  weighted  appropriately.  The  weights  should 
be  nonnegative,  sum  to  one,  and  can  be  thought  of  as  an  artificially  constructed 
probability  distribution  associated  with  the  sample.  Specifically,  suppose  that  Xi,  i  — 
l,...,z/,  are  the  points  where  the  sample  is  observed  and  i  =  1 ,  ...,zq  are  the 
corresponding  realizations  of  Y(xj).  When  estimating  a  superquantile  at  x,  we  put 
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more  “trust”  on  sample  points  taken  near  x  and  consequently  the  weight  of  ( ay,  )  may 
be  inversely  proportional  to  || x%  —  £||,  with  an  appropriate  adjustment  if  x  coincides 
with  an  Xi. 

A  justification  for  the  approach  follows  directly  from  Theorem  11.3  through 
the  next  proposition. 

Proposition  II. 5.  Suppose  that  the  assumptions  of  Theorem  II. 3  hold  and  that 
the  probability  distribution  of  ( X ,  Y)  is  degenerate  at  x  G  IRn+1  in  the  sense  that 
P{(X,Y )  <  (. x,y )}  =  (p(y),  for  all  y  G  M  and  x  >  x,  where  (p(y)  =  P{Y(x)  <  y}, 
and  P{(X,Y)  <  (x,y)}  —  0  otherwise. 

If  {(c o,  c")}^  is  a  sequence  of  optimal  solutions  of  SqRu ,  with  a  G  (0, 1),  then 
along  every  convergent  subsequence  we  have  that  Cq  +  ( c ",  h(x))  tends  to  qa(Y{x)). 

Proof. 

For  the  given  degenerate  distribution  of  (X,  Y),  Co  +  (c,  h(X))  =  cq  +  (c,  h(x)}  almost 
surely.  Consequently,  SqR  reduces  to  the  error  minimization  problem  of  Theorem 
II. 1  and  Co  +  ( c,h(x ))  =  qa(Y(x))  for  every  (co,c)  G  C.  The  conclusion  then  follows 
from  Theorem  II. 3. 

□ 

Suppose  that  the  weights  of  ( x^yi ),  i  =  1,2,  in  the  above  construction 
are  chosen  to  approximate  the  degenerate  distribution  of  Proposition  II. 5,  for  example 
by  setting  them  inversely  proportional  to  ||ay  —  £||.  Then,  in  view  of  Proposition  II. 5, 
a  solution  of  SqRu,  constructed  using  those  weights  as  an  artificial  probability  distri¬ 
bution  for  (X lu,Yu),  leads  to  an  approximation  of  the  considered  superquantile  at  x. 
Of  course,  this  procedure  can  be  repeated  for  different  points  x  to  generate  a  “global” 
assessment  of  qa(Y(x))  as  a  function  of  x  and  eventually  facilitate  optimization  over 
x.  Moreover,  the  process  can  be  repeated  with  new  or  augmented  sample  points  in 
a  straightforward  manner.  In  a  situation  where  a  sample  is  not  fully  randomly  gen¬ 
erated  but  x-points  are  determined  by  an  analyst,  the  approach  may  even  motivate 
scattering  those  points  near  a  point  of  interest  x  instead  of  concentrating  them  all  at 
x  exactly.  The  former  approach  certainly  results  in  a  better  “global”  understanding 
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of  a  superquantile  as  a  function  of  x,  but  may  prove  to  be  a  more  economical  route  to 
estimate  a  superquantile  at  x  too.  We  examine  this  situation  numerically  in  Chapter 
IV. 


C.  VALIDATION  ANALYSIS 

Regression  modeling  must  be  associated  with  means  of  assessing  the  goodness 
of  fit  of  a  computed  regression  vector.  The  process  of  validating  a  regression  fit  is 
important  as  it  allows  us  to  decide  whether  the  obtained  numerical  results  quantify 
how  well  the  model  explains  and  predicts  future  outcomes.  A  commonly  used  mea¬ 
sure  that  allows  such  assessment  is  the  coefficient  of  determination.  In  least  squares 
regression,  this  coefficient,  also  known  as  R.-squared,  is  defined  as 


R2 


SSRes 
SST  ’ 


where  SSRes  denotes  the  residual  sum  of  squares  and  SSt  the  total  sum  of  squares. 
While  R 2  cannot  be  relied  on  exclusively,  it  provides  an  indication  of  the  goodness  of 
fit  that  is  easily  extended  to  the  present  context  of  superquantile  regression.  In  our 
notation, 


R2  =  1 


E[Z(co,  c) 
a\Y) 


(11.35) 


and  similarly  when  passing  to  an  approximate  random  vector  (XU:YU).  From  Ex¬ 
ample  1’  in  Rockafellar  and  Uryasev  (2013),  we  know  that  the  numerator  in  (11.35) 
is  a  measure  of  error  applied  to  Z(c0,c )  and  that  its  denominator  corresponds  to 
the  measure  of  deviation  cr2(-).  Moreover,  the  minimization  of  that  error  of  Z(co,c) 
results  in  the  least  squares  regression  vector.  According  to  Rockafellar  and  Uryasev 
(2013),  these  measures  of  error  and  deviation  are  in  correspondence  and  belong  to  a 
family  of  risk  quadrangles  that  yields  the  expectation  as  its  statistic.  Therefore  we 
could  write  the  formula  for  R2  as  follows 


_  S(Z(c0,c)) 
V(Y) 


(11.36) 
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This  observation  motivates  the  following  definition  of  coefficient  of  determination 
applied  to  quantile  regression. 


Definition  II.  1.  In  quantile  regression,  the  coefficient  of  determination  of  a  regres¬ 
sion  vector  (cq,  c)  G  Mm+1  is  given  by 


where  Z(c0,c)+ 


ft'a(co,c) 


£a(Z(c0,c)) 

Va{Y) 

E  \iZ^Z(c0,  c)+  +  Z(c0,  c)_] 
qQ(V)  -  E[Y] 


=  max{0,  Z(c0,  c)}  and  Z(c0,  c)_  =  max{0,  —  Z(c0,  c)}. 


(11.37) 


In  least  squares  regression,  the  coefficient  of  determination  is  a  value  expressed 
between  zero  and  one,  which  leads  us  to  the  following  proposition. 


Proposition  II. 6.  For  a  regression  vector  (c o,  c)  G  M"l+1 
that 


0  <  Rg{CO,c)  <  1. 


and  a  G  (0,1),  one  has 
(11.38) 


Proof. 

By  the  definition  of  coefficient  of  determination  in  quantile  regression  and  of  quantile 
error  and  deviation  measures,  in  the  sense  of  Rockafellar  and  Uryasev  (2013),  we  have 
that 


Rl(c0,c)  =  1 


=  1 


=  1- 


£a(Z(c0,c)) 

Va{Y) 

Sa{Y-cQ-{c,h{X))) 

min  £a(Y-£) 

R 

£a(Y-c0-(c,h(X))) 


(11.39) 


£*(Y-Z*) 

where  is  an  optimal  solution  to  min^g/j  £a(Y  —  £).  Both  quantile  error  and 
deviation  measures  are  nonnegative  quantities,  which  proves  the  upper  bound.  Since 
the  regression  vector  (c0,  c)  G  R"  1  is  obtained  by 


min  £a  (Y  -  c0  -  (c,  h(X))) , 

(c0,c)eRm+1 
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we  guarantee  that 

£a(Y  -  c0  -  (c,  h(X)))  <  Sa(Y  —  £*), 
in  equation  (11.39),  which  gives  i?(((co,c)  >  0. 

□ 

Applying  the  same  idea  to  superquantile  regression,  we  obtain  the  following 
definition  of  the  coefficient  of  determination. 


Definition  II. 2.  In  snperqnantile  regression,  the  coefficient  of  determination  of  a 
regression  vector  (c0,  c)  G  R"  '  1  is  given  by 


Rl(c0,c) 


£a(Z(c0,c)) 

T>a(Y) 

yzy  fp  max  {0,  qg(Z(c0,  c))}  d/3  —  E  [ Z(c0 ,  c)] 
T hll^Y)d^~E[Y] 


(11.40) 


In  fact,  a  similar  definition  can  be  formulated  for  any  generalized  regression 
consisting  of  minimizing  an  error  of  Zf. 

The  bounds  on  the  coefficients  of  determination  for  least  squares  and  quantile 
regressions,  can  also  be  applied  to  the  coefficient  of  determination  in  the  superquantile 
regression  case,  using  the  same  arguments  as  in  the  previous  proof. 

Proposition  II. 7.  For  a  regression  vector  (co,c)  G  lRm+l  and  a  G  (0,1),  one  has 
that 

0  <  R2a(co,c)  <  1.  (11.41) 

Proof. 

By  the  definition  of  coefficient  of  determination  in  superquantile  regression  and  of 
superquantile  error  and  deviation  measures,  in  the  sense  of  Rockafellar  and  Uryasev 


47 


(2013),  we  have  that 


Rl(c0,c) 


£a(Z(c0,c)) 

t>a(Y) 

£a(Y-c0-(c,h(X))) 
min  £a(Y-£) 

£E-R 

£a(Y-c0-(c,h(X))) 

sa(Y-e) 


(11.42) 


where  is  an  optimal  solution  to  mhi£e#  £a(Y  —  £).  Using  the  same  arguments  as 
in  the  proof  of  Proposition  11.6,  we  arrive  at  the  conclusion. 


□ 


As  in  the  classical  case,  higher  values  of  R2a  are  better,  at  least  in  some  sense. 
Indeed,  SqR  aims  to  minimize  the  error  of  Z(c o,  c)  by  wisely  selecting  the  regression 
vector  (cq,c)  and  thereby  also  maximizes  R2a) 


ar grain £a  (Y  —  [co  +  (c,  /r(X))])  argrnax  R^(co,  c ).  (11.43) 

Co,C  Co,C 

The  error  is  “normalized”  with  the  overall  “nonconstancy”  in  Y  as  measured  by  its 
measure  of  deviation  to  more  easily  allow  for  comparison  of  coefficients  of  determina¬ 
tion  across  data  sets. 

However  it  is  possible  to  obtain  large  coefficients  of  determination  by  adding 
explanatory  terms  to  a  regression  model,  i.e.,  increasing  m,  but  without  necessarily 
achieving  a  more  useful  model.  Hence,  it  is  usual  in  least  squares  regression  to  also 
evaluate  an  adjusted  coefficient  of  determination  that  penalizes  any  term  added  to 
the  model  that  does  not  reduce  variability  substantially.  This  quantity  only  increases 
if  a  new  term  reduces  SS-^es/{v  —  m  —  1)  as  seen  by  the  definition 

d2  _1  SSRes/(u  -  m-1) 

Adj  SST/(u  -  1) 

where  v  is  the  number  of  observations.  Naturally,  then,  we  define  an  adjusted  coeffi¬ 
cient  of  determination  for  quantile  and  superquantile  regressions  similarly  in  the  case 
where  the  distribution  of  (A",  Y)  has  a  finite  support  of  cardinality  v. 
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Definition  II. 3.  In  quantile  regression,  the  adjusted  coefficient  of  determination  of 
a  regression  vector  (cq,c)  G  Mm+ 1  is  given  by 


-^o,Adj  (CO,  c) 


£a(Z(c0,c))/(u  -  m  -  1) 

pa(y)/(i/-i) 


(11.45) 


Definition  II. 4.  In  superquantile  regression,  the  adjusted  coefficient  of  determina¬ 
tion  of  a  regression  vector  (c0,  c)  G  M'n+1  is  given  by 


-^o,Adj  (CO,  c) 


£a(Z(c0,c))/(u  -  m  -  1) 

pa(y)/(i/-i) 


(11.46) 


Again,  similar  expressions  are  available  for  other  generalized  regression  techniques. 

When  performing  least  squares  regression  analysis,  we  have  other  commonly 
used  validation  methods.  These  include  computing  the  Cook’s  distance  for  each 
observation  used  in  the  model,  which  provides  an  estimate  on  how  an  observation 
influences  the  obtained  regression  fit.  This  distance  allows  the  decision  maker  to 
easily  understand  which  observation  might  be  considered  an  outlier  and  which  should 
be  checked  for  validity.  In  least  squares  regression,  the  Cook’s  distance  for  observation 
i  is  defined  as 


„  (/(x)  -  fc>(x)y  (/(x)  -  f«>(x)y 

•  mMSE  m  (f(X)  -  Y)2  '  {  '  1 

where  /W(-)  represents  the  fitted  regression  function  without  observation  i,  and  MSE 
denotes  the  mean  square  error  of  the  regression  model.  Using  the  measure  of  error 
that  is  corresponding  to  the  expectation  as  the  statistic,  and  similar  to  the  approach 
in  (11.36),  we  could  write  the  formula  for  the  Cook’s  distance  in  (11.47)  for  observation 
i  as  follows 


£(S(X)  -  /wpp) 

'  ■  m  £(  f(X)  -  Y) 

With  this  in  mind,  we  next  define  Cook’s  distances  applied  to  quantile  and  superquan¬ 
tile  regressions. 
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Definition  II. 5.  In  quantile  regression,  the  Cook’s  distance  estimates  for  a  regression 
vector  (cq,c)  G  lRm+l  is  given  by 


-^h,a(*-Cb  E) 


m  £a(f{X)  -  Y ) 

E  [j^{f(X)  -  /«(X)}+  +  {/(X)  -  /«(X)}_] 


mE  [^{/(X)  -  X}+  +  {/(X)  -  X}_] 


(11.49) 


where  Y+  =  max{0,  X}  and  Y_  =  max{0,  —  Y}. 


Definition  II. 6.  In  superquantile  regression,  the  Cook’s  distance  estimates  for  a 
regression  vector  (c0,  c)  G  Mm+1  is  given  by 


Ei^a^Co,  c) 


Sa{f{X)  -  f®{X)) 
m  £a(-Z(c0,c )) 

A  fo  gq(/ W  -  /W (X))}d/3  -  g/(X)  -  /W(X)] 

fo  max{0,  qa(-Z(c0,  c))}dj3  +  E[Z(c0,  c)] 

(11.50) 


As  shown  in  Section  II. B,  we  only  use  one  assumption  when  building  our 
superquantile  regression  problem,  finite  second  moments  for  the  random  variables. 
This  generalization  allows  the  regression  problem  to  be  applied  in  many  situations, 
but  makes  validating  the  obtained  model  a  harder  process. 

For  the  scope  of  the  dissertation  we  do  not  develop  other  model  validation 
techniques  since  we  discard  many  of  the  commonly  used  model  assumptions,  such 
as  normality,  or  homoscedastic-ity,  that  are  usually  requirements  for  such  assessment 
tests.  However,  we  recall  that  cross-validation  is  a  tool  to  take  into  account  for 
validating  the  regression  model,  especially  for  larger  sample  sizes  u.  Obviously,  when 
the  sample  size  is  small  and  we  choose  a  high  probability  level  a,  subdividing  the 
sample  into  training  and  testing  data  sets  is  not  a  wise  decision. 

In  the  next  chapter,  we  develop  computational  methods  that  allow  us  to  im¬ 
plement  these  theoretical  results. 
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III. 


COMPUTATIONAL  METHODS 


In  this  chapter,  we  develop  computational  methods  that  allow  us  to  solve  the 
superquantile  regression  problems  of  Section  II. B.  This  computational  task  consists  of 
solving  the  convex  optimization  problem  SqR,  or  in  practice  the  approximate  problem 
SqRu  due  to  incomplete  distributional  information. 

In  the  next  two  sections,  we  describe  convenient  means  for  solving  the  su¬ 
perquantile  regression  problems  when  (XU,YU)  has  a  discrete  joint  distribution  with 
v  possible  realizations.  Regardless  of  the  distribution  of  (XU:YU),  a  reformulation  of 
the  approximate  problem  SqRu  in  terms  of  the  deviation  measure  Va  is  beneficial.  In 
view  of  Proposition  II. 3,  the  task  of  determining  a  regression  vector  (cq,  c")  reduces  to 
that  of  minimizing  'D01(Zq(-)),  obtaining  cv  as  an  optimal  solution,  and  then  setting 

Since  it  is  straightforward  to  compute  every  superquantile  of  a  random  variable 
Y  with  a  discrete  probability  distribution,  as  follows 

V 

E  PjVj  if  a  =  0, 

j= i 

(i  \  v  i— 1  i 

E  ip  ~  a  )  Vi  +  E  PjVj  if  E  pj  <  «  <  E  Pj  <  !, 

3= 1  /  3=i+ 1  J  3= 1  J  =  1 

Vu  if  a  >  1  -  p„, 


with  pj  being  the  corresponding  probabilities  of  the  realizations  yj  of  Y,  which  are 
ordered  from  smaller  to  larger,  we  only  focus  on  the  minimization  problem,  which 
takes  the  following  form: 
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We  denote  these  computational  methods  by  primal  methods  since  we  compute 
the  regression  vectors  solving  the  original  problem.  The  material  in  Section  III. A  is 
to  a  large  extend  based  on  our  paper  Rockafellar  et  al.  (2014). 

In  Section  III.B,  we  use  a  different  approach  that  relies  on  the  dualization  of 
risk  and  using  the  theory  developed  in  Rockafellar  and  Royset  (2014c),  we  generate 
a  computational  method  that  we  denote  as  the  dual  method. 

A.  PRIMAL  METHODS 

The  next  subsections  describe  two  primal  computational  methods  for  solving 
DSqRu.  The  first  one  solves  our  problem  by  analytical  integration,  while  the  second 
one  utilizes  numerical  integration  techniques. 

1.  Analytical  Integration 

At  first  one  might  get  the  impression  that  numerical  integration  is  required 
for  solving  DSqRu,  but  this  may  not  actually  be  needed  as  we  show  next.  Suppose 
that  (XU,YU)  has  a  discrete  distribution  with  support  (ad,  2/J),  j  =  1,2,..., u  and 
probability  of  occurring  P{(XU,YU)  =  (ad,?/-7)}  =  1/v  for  j  =  l,2,...,v.  This  is  the 
case  we  typically  encounter  in  applications,  where  (ad,?/-7),  j  =  1,2,...,  v,  is  the  data 
assumed  to  be  equally  likely  to  occur.  We  then  obtain  significant  simplifications  in 
the  approximate  regression  problem  DSqRu. 

For  any  fixed  c  G  lRm,  the  cumulative  distribution  function  of  Zq(c)  is  a 
piecewise  constant  function  with  at  most  v  steps.  The  range  of  the  distribution 
function  is  (0, 1/z/,  2/z/, ...,  1}  or  a  subset  thereof.  By  partitioning  the  integral  over  /3 
in  DSqRv  according  to  this  range,  and  accounting  for  the  fact  that  the  integral  starts 
at  a,  we  can  then  rewrite  the  optimization  problem  in  this  case  as 

min  — y  f'  qe(ZS(c))dp  -  B[ZJ(c)].  (III.l) 

cG/t  1  —  (X  —  Jo. 

i=Va.  Pi—i 

where  ua  :=  [zzq;],  with  [a]  being  the  smallest  integer  no  smaller  than  a  G  R, 
f3va- i  =  a,  and  A  =  i/v,  for  i  =  ua,  ua  +  1, ...,  u. 
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We  recall  that 


qa(Y)  G  argmin  {c0  +  Va(Y  -  c0)} 

Co  G-R 

qa(Y)  =  min  {c0  +  Va(Y  -  c0)}  . 

coS-R 

Consequently, 

it>(ZS(c))  =  minUp  +  Y-p  E{msx{ZZ(c)  -  Up,0}}  (III.2) 

=  1n(zo(c))  +  YZTfj  Elmax(zo(c)  ~  1f>(zo(c)),0}] 

for  each  (3  G  [0, 1). 

The  special  piecewise-constant  structure  of  the  cumulative  distribution  func¬ 
tion  of  Zq(c)  implies  that  qp^Z^c))  is  constant  as  a  function  of  f3  on  the  intervals 
(/3j_i,/3i),  for  every  i  =  ua,  ua  +  1, u.  Consequently,  Up,  for  [3  G  (a,  1)  in  equation 
(III. 2)  can  be  replaced  by  a  finite  number  of  variables  so  that  equation  (III.l)  takes 
the  form 


l  (  1 

min  -  >  /  min  (  hH - -MmaxjZ/Yc 

cei?m  1  ^  Jpt_x  uteR  V  1  -P 

L  —  "CX. 


V,o}] 


dP  -  E[zpc)\. 


The  last  integral  simplifies  further  since  for  f3  G  {f3v-i,  f3v)  =  (1  —  1/v,  1),  we  have 
that 


qp{Z o(c))  =  M(c )  :=  .  max  y 3  -  ( c,xJ ), 


and  therefore 
1 


1  —  a 


I'Pv 

fiv-1 


mm 

UV£R 


Uu  +  i—pE[max{ZZ(c)  -  C/,,0}]  )  dp 


1  —  a 


M  (c) 


M(c ) 


/  n  v 

//3,-i  n1  -  «) 


Consequently,  equation  (III.l)  takes  the  form 


"-1  /-ft 


nun 


ceRm  1  —  a 


YL  min  [  Ui  +  1  _  —  £J[max{Zo  (c)  -  Uh  0}]  j  df3 


l-(3 


+  ~  E[z°(c)y 
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The  order  of  minimization  is  immaterial  and  we  can  equivalently  consider 


f  (Ui  +  YZ-~E[max{Z% (c)  -  Uh  0}]^  d/3 

a  i=l/a  •'Pi-1  \  >  ' 

M(c) 


mm  - 

c£R"\  UeRv-va  1 


-  E[ZZ(c)l, 


u(  1  —  a ) 

where  we  let  U  =  (UVa,  U„a+U  ...,  U„- 1)  G  Ru~Ua. 

In  order  to  simplify  the  notation  in  our  minimization  problem,  we  define  a*, 
for  i  =  ua,  va  +  1, v  —  1,  as  follows 

rPi  1 


d;  := 


-d/3  =  log(l  -  A- 1)  -  log(l  -  A)- 


'ft_i  1  -  P 

Using  this  notation,  equation  (III.  1)  simplifies  even  further  to 


mm 


c£Rm,  UeRv~Va  1  —  01 


-  V  —  l  ^  V—  1 

VI  (A  -  Pi-i)Ui  +  __  y  U[max{Zo(c)  -  £/*,()}]  a* 


M  (c) 
i/(l  —  a) 


E[ZoM]. 


By  introducing  another  set  of  auxiliary  variables  and  using  the  standard  transcrip¬ 
tion  technique  for  handling  max- functions,  we  reach  the  linear  program  P//P  that 
implements  analytical  integration. 
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Problem: 


P 


LP  • 


1 

u-l 

-t 

v—\ 

V 

min 

l,U,V,W 

1 

1  —  a 

M 

Jo 
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A- 

1  )Ui 

+ 

1 

v(l- 

a\ 

i  =  Voi 

i=Va 

3= 1 

+ 

w 

v(l  - 

■  a) 

-jy- 

V  t—* 

3= 1 

(c,h(xj))) 

S.t. 

y3 

—  (c,  h(xj 

))■ 

-  Ui 

< 

Vij ,  i 

=  "a,  ■ 

■■,V~  1  ,j  : 

=  l,...,z/ 

0 

< 

Vij ,  i 

=  "a,  ■ 

■  ■,V-  1  ,j  ■ 

=  !,•••,  v 

yJ  -  {(■■ 

M 

X3)) 

< 

w,  J  ■ 

=  !,■■■ 

,V 

c 

g 

Rm 

u  - 

=  (UVa  I'-' 

v-l) 

G 

JRv~Va 

v  = 

iv^a,  1;  ■  •  •  ) 

Vv- 

-l,v) 

G 

1 

£ 

> 

w 

G 

R. 

This  equivalent  reformulation  of  DSqRu  involves  m  +  (u  —  va){y  +  1)  +  1 
variables  and  2(y  —  ua)u  +  v  inequality  constraints.  In  practice,  with  the  probability 
level  a  being  set  close  to  1,  va  —  \isa]  may  be  close  to  the  number  of  observations  v. 
Consequently,  the  linear  programming  problem  P^p  becomes  large-scaled  when  the 
sample  size  v  is  large  and  decomposition  algorithms  may  be  needed. 

Alternatively,  we  consider  next  a  numerical  integration-based  scheme  that 
avoids  some  auxiliary  variables  and  constraints,  and  also  handles  the  situation  where 
the  distribution  of  (Xu,  Yu)  is  not  uniformly  discrete. 

2.  Numerical  Integration 

The  integral  in  DSqRu  is  easily  approximated  using  standard  numerical  in¬ 
tegration  techniques.  Suppose  that  the  interval  [a,  1]  is  divided  into  fi  subintervals, 
where  a  <  (30  <  f3i  <  ...  <  /3M_i  <  (3^  <  1  and  Wi  >  0,  i  =  0, 1,  are  factors 
specific  to  the  integration  scheme.  An  approximation  of  DSqRu  then  takes  the  form 
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Problem: 

PNum  ’  :  mm  J2  wm  ( zo  (c) )  -  E  [Zl (c)] . 

C  a  i=0 


For  large  \i,  an  optimal  solution  of  problem  is  close  to  that  of  DSqRu, 

as  seen  next,  under  conditions  that  are  satisfied  by  essentially  all  commonly  used 
numerical  integration  schemes. 


Proposition  III.  1 .  Suppose  that  for  any  continuous  function  g  :  [a,  1]  — >•  M,  a 
numerical  integration  scheme  with  discretization  points  a  <  j30  <  /3i  <  ...  <  i  < 
13 n  <  1  and  weights  Wi  >  0,i  =  0, 1, ...,  p,  satisfies 


£ 

i= 0 


Wi9{Pi)  ~  /  g(/3)d(3 


->•  0 


as  pL  — *  oo  .  Let  6e  a  sequence  of  optimal  solutions  of  under  this 

numerical  integration  scheme.  Then,  every  accumulation  point  of  {ciy,M}^=1  is  an 
optimal  solution  of  DSqRu. 


Proof: 

For  any  c  G  Mm ,  qy^Zf^c))  is  finite  and  continuous  as  a  function  of  j3.  Consequently, 
the  assumption  on  the  numerical  integration  scheme  applies  and  the  objective  function 
of  Pfilfff  converges  pointwise  to  that  of  DSqRu,  as  p  — *  oo. 

The  objective  functions  are  also  finite  and  convex  in  c,  which  follows  directly 
from  the  convexity  of  qa  on  £2  and  the  affine  form  of  Zlf  as  a  function  of  c.  Conse¬ 
quently,  by  Theorem  7.17  in  Rockafellar  and  Wets  (1998),  the  objective  function  of 
-^Nun-T  epiconverges  to  that  of  DSqRu  and  the  conclusion  follows  from  Theorem  7.31 
in  Rockafellar  and  Wets  (1998). 

□ 

While  specialized  solvers,  such  as  Portfolio  Safeguard  in  American  Optimal 
Decisions,  Inc.  (2011),  handle  directly  with  little  difficulty  under  many  circum¬ 

stances,  the  problem  is  typically  nonsmooth  and  standard  nonlinear  programming 
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solvers  may  fail.  However,  following  a  simple  reformulation  of  P^m  >  utilizing  equa¬ 
tion  (II. 7),  we  obtain  the  equivalent  linear  program  formally  stated  below,  where  we 
assume  for  convenience  that  /^  <  1. 


Problem: 


pp,/z,u; 

rNum,LP  : 


1 

mm - 

c,u,v  1  —  OL 


Y w>  ^  + 

s.t.  y3  —  (c,  h(x3))  —  Ui 


0 


c 

u  (rio,  u± ,  •  •  • ,  Up) 

V  =  (u0,i,  ...iV) 


< 

< 

G 

G 

G 


i=i 

Vij ,  *  =  0, 1, ...,//  ,j  = 

vij,  i  =  0,  1, ...,//  ,j  =  l,...,z/ 

JRm 

W+l 


If  A  =  1,  then  a  straightforward  modification  is  required  based  on  the  fact  that 
5i(Zq(c))  =  ma Xj=12j...,uy3 —  (c,x3).  This  linear  program  consists  of  m+fj,+l+u(/j,+l) 
variables  and  2z/(/i  +  l)  constraints,  which  may  be  substantially  less  than  what  follows 
from  the  analytical  integration  approach  for  large  u.  Here  we  assume  that  the  weights 
Wi  >  0,  i  =  0,  l,...,/i,  are  given  and  therefore  not  accounted  for  in  the  complexity 
analysis  results.  For  example,  in  Chapter  IV  we  assume  that  the  /x  +  1  subintervals 
have  the  same  weights.  And  in  practice,  we  find  that  a  moderately  large  /x  suffices  as 
shown  in  the  numerical  examples  discussed  in  the  same  chapter. 

B.  DUAL  METHODS 

We  now  turn  to  a  distinct  perspective  towards  the  alternative  superquantile 
regression  problem  DSqR.  We  use  the  theory  of  the  dualization  of  risk  to  build  a 
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dual  problem  as  described  in  the  next  subsection.  We  then  solve  this  new  problem 
using  different  algorithms,  as  seen  in  Subsections  III.B.2  through  III.B.4. 

1.  Dualization  of  Risk 

We  start  this  subsection  by  recalling  the  risk  measure  TZa  corresponding  to  the 
superquantile  as  the  statistic.  According  to  equation  (11.18),  the  measure  of  deviation 
for  our  superquantile-based  quadrangle  is  described  as  follows 

If1 

Va{Y)  =  na(Y)  -  E  [Y]  =  ——  /  qp(Y)dP  -  E  [Y]  =  fQ(Y)  -  E  [Y] ,  (III.3) 

*  ®  Ja 

where  1Za(Y)  =  qa(Y )  is  the  risk  measure  for  which  we  build  the  dual. 

Next  we  turn  to  the  dualization  of  risk  measures  and  derive  results  that  we 
can  apply  to  our  deviation-based  superquantile  regression  problem  DSqR.  By  the 
Envelope  Theorem  in  Rockafellar  and  Uryasev  (2013),  an  alternative  formula  for  a 
positively  homogeneous  regular  risk  measure  Tl(-)  is  given  by  its  dual  representation, 
described  as  follows 

K{Y)  =  sup{£[YQ]},  (HI. 4) 

QeQ 

where  Q  is  a  nonempty  closed  convex  set  that  is  to  the  risk  envelope  associated  with 
7Z.  For  Y  G  £2  and  a  G  (0, 1),  a  Q}  that  maximizes 

sup  {E  [YQ ]} 

QeQ 

is  called  a  risk  identifier.  If  Qy  is  a  risk  identifier,  then  obviously 

n{Y)  =  E[YQy].  (hi. 5) 

Clearly,  when  we  have  a  risk  measure  TZ(Y)  —  Y[Y] ,  we  get  Q  =  1.  And  for  7 Z(Y)  = 
supY,  we  obtain  Q  G  {Q  G  C2  \  Q  >  0 ,E[Q\  =  1}. 

For  the  general  treatment  of  risk  identifiers,  we  refer  to  Rockafellar  and  Royset 
(2014c).  We  consider  the  case  where  hi  is  finite  and  P({o;})  >  0,  for  u  G  11,  to  avoid 

technical  issues  regarding  measurability.  We  let  O^Y)  =  (w  G  O  |  Y(cu)  =  (^(Y)},  for 
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f3  £  (0, 1),  and  Ffi  (y)  denote  the  left-continuous  point  of  the  cumulative  distribution 
function  Fy. 

Below  we  derive  a  risk  identifier  formula  for  the  superquantile  at  Y  and  prob¬ 
ability  level  (3  £  (0, 1). 


Proposition  III. 2.  (Rockafellar  &  Royset,  2014c)  For  (3  £  (0, 1)  and  Y  £  C2.  a  risk 
identifier  for  qp(Y)  is  given  by 


iri 


0 


v 


ifY{u>)>qp(Y), 
ifY(u)  =  qy{Y), 
otherwise, 


with  0  <  rp{uf)  <  1/(1  —  /3)  for  u>  £  ft  such  that 

Fy(qp{Y))  —  (3 


'typo 


rp{uf)dP{uf)  = 


1-/3 


(III. 6) 


We  now  turn  to  the  risk  identifier  for  our  choice  of  measure  of  risk  in  problem 
DSqR ,  the  o-second-order  superquantile.  We  interpret  0  times  —  oo  as  zero.  Let  Qfi 
be  a  risk  identifier  for  qa  (Y). 


Proposition  III. 3 

distribution  with  v 
identifier  of  qa{Y), 


.  (Rockafellar  &  Royset,  2014c)  Suppose  that  Y  has  a  discrete 
possible  realizations.  Then,  for  a  £  (0, 1)  and  Y  £  C2,  a  risk 
is  given  by 
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( 


1 

1—a 


log 


1—a 

1  -Fy(Y(u)) 


1 

1— a 


1— a 

1  -F~(Y(u)) 


+  1 


ifa<Fy{Y{u>))  =  Fy{Y{u>))<l 
if  a  <  Fy(Y{u))  <  Fy(Y(u))  =  1 


i 

1—a 


1 —a 

1  -F~(Y(u)) 


+  1 


CM 


1  -Fy(Y{u)) 
Fy(Y(oj))-F~(Y(uj)) 


log 


1-Fy(Y(u))  ] 
1  -F~(Y(u))  J 


< 


1 

1—a 


FY(Y(u))—a 

Fy(Y(u))-F-(Y{u)) 


ifa<Fy(Y(u))<Fy(Y(u)) 
ifFy(Y(u ))  <  a  <  Fy{Y{u))  =  1 


l 

1—a 


FY(Y(u))-a 

Fy(Y(u))-F~{Y{w)) 


+ 


l-Fy(FH)  1  1  -Fy{Y{u))  1 

Fy(Y(lj))-F~(Y(u))  1—a  J 


ifFy(Y{u>))  <a<  Fy{Y{uj)) 
and  Fy(Y(u))  <  Fy(Y(u)) 


^  0  otherwise. 

In  view  of  Theorem  4.13  in  Rockafellar  and  Royset  (2014c),  and  equations 
(III. 3)  and  (III. 4),  we  are  now  able  to  build  a  dual  method  to  solve  the  Deviation- 
based  Superquantile  Regression  Problem  DSqR. 

Consider  the  risk  identifier  Qal(c>  of  qa(Z0(c)),  as  defined  in  Proposition  III. 3, 
for  a  probability  level  a  G  (0, 1).  Then,  according  to  equation  (III. 3),  we  have  that 


Va(Z0(c))  =  na(Z0(c))-E[Z0(c)] 

=  UZo(c))-E[Z0(c)} 

=  E[Z0(c)Qzq °W]  -  E  [Z0(c)\ .  (III.7) 


And  we  are  able  to  define  the  objective  function  of  this  new  problem  as  follows 

m  =  (1118) 

i=l  j= 1 

=  l  Y.  (y(<}  -  <c-  m*®)»  <3?(c)w  -  -  Y  (</  -  (c,  . 

V  i=  1  V  3= 1 

where  Z0(cM  is  the  it/l-ordered  value  of  Z0(c).  The  evaluation  of  the  objective  func¬ 
tion  requires  the  computation  of  Qa°(c'1 .  According  to  Proposition  III. 3,  this  implies 
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sorting  vector  Z0(c )  for  a  given  c  to  obtain  its  cumulative  distribution  function  and 
only  then  evaluate  Qa°^\  using  the  same  sorting  as  for  Z0(c)^.  A  subgradient  of 
/(c)  is  then  easily  computed  as  follows 

V/( (c)  =  P(i>)Q?W(i)  +  \  E  h  60  ’  (HI-9) 

i=l  j=l 

with  h  (x^)  maintaining  the  same  ordering  as  in  Z0(c )W  used  in  (III. 8). 

The  Approximate  Dualization-of-risk  Superquantile  Regression  Problem  Du  is 
defined  as: 


Problem: 

D-  :  mm  A  (Zq(c))  =  ZJWW<5f“W(0  -  -  £  Z«6)t 

*=i  i=i 

with  Cq  being  obtained  by  setting  Cq  =  qa{ZQ  (cu)),  and  Qa0^  given  by  Proposi¬ 
tion  III. 3. 

We  now  turn  to  the  implementation  of  these  results.  In  the  next  subsections  we 
present  three  algorithms  that  are  well  known.  First  we  start  with  a  simple  algorithm, 
the  subgradient  method,  and  then  move  to  an  heuristic  algorithm,  the  coordinate 
descent  method,  and  finish  off  with  the  cutting  plane  method.  There  are  obviously 
many  other  possible  algorithms  we  could  implement  when  solving  the  dual  methods, 
but  we  omit  such  investigation  and  only  discuss  these  three  as  examples. 

2.  Subgradient  Method 

The  subgradient  method  was  originally  developed  by  Naum  Z.  Shor  and  oth¬ 
ers  in  the  1960s  and  1970s;  see  Shor  (1985).  It  is  a  simple  algorithm  that  can  be 
implemented  for  solving  a  wide  variety  of  problems,  such  as  the  minimization  of 
nondifferentiable  convex  functions. 

The  subgradient  method  is  an  iterative  algorithm  that  aims  to  minimize  a 
convex  function  /,  by  iteratively  obtaining  a  new  ck+1  according  to  the  following 


61 


scheme 


ck+1  =  ck -  5kVf(ck),  (III.  10) 

where  V/(cfc)  is  any  subgradient  of  /  evaluated  at  ck,  and  5k  is  the  stepsize  used  in 
iteration  k.  As  a  downside,  this  algorithm  is  not  a  descent  method  and  it  is  possible 
to  obtain  increased  objective  function  values  in  any  iteration,  therefore  we  need  to 
store  the  best  obtained  objective  function  value  by  setting  fkest  =  min  { ,fb~l ,  fbest}- 
In  fact,  if  we  obtain  the  best  function  value  so  far  in  iteration  /c,  we  also  need  to  store 
ikest  =  k.  This  way  we  guarantee  to  have  fkest  =  min  {/(c1),  /(c2), . . . ,  /(cfc)}  stored 
for  later  use. 

There  are  obviously  many  rules  to  define  the  stepsize  used  in  algorithm  SM, 
as  we  describe  below.  For  example,  one  could  use  a  step  with  constant  length  instead, 
$k  =  5/||  V/(cfc)||2,  so  that  ||cfc+1  —  ck || 2  =  5,  or  perhaps  a  diminishing  stepsize,  such 
as  5^  =  71  / [k  +  72),  with  71  and  72  being  some  positive  scalars.  The  importance  of 
the  right  choice  of  stepsize  5k  becomes  more  aparent  when  we  discuss  computational 
performances,  later  in  Section  III. C. 

We  now  formally  describe  the  subgradient  method. 


Algorithm  SM: 

Step  0.  Choose  an  initial  guess  c°  G  Mm.  Set  k  :=  0. 

Initialize  /fe°est  :=  00,  and  i°best  :=  0. 

Step  1.  Compute  /(cfc)  and  V/(cfc),  using  Equations  (III. 8)  and 
(III. 9),  respectively. 

If  V/(cfc)  =  0,  then  ck  is  an  optimal  solution,  stop. 

Step  2.  Set  fj;est  =  min  {/£~\  f{ck)},  and  let  i\est  =  k  if  f(ck )  =  f£est. 
Step  3.  Choose  stepsize  5k,  with  5k  >  0. 

Step  4.  Define  ck+1  =  ck  —  5fcV f(ck). 

Replace  k  by  k  +  1  and  go  to  Step  1. 
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3.  Coordinate  Descent  Method 

The  coordinate  descent  method  is  an  heuristic  algorithm  that  is  simple  to 
implement.  In  this  method,  the  objective  function  is  minimized  along  one  coordinate 
direction  per  iteration  and  a  cycle  is  complete  when  all  coordinates  have  been  utilized 
in  this  process.  Although  we  could  define  any  permutation  of  coordinates  as  the  order 
for  the  coordinate  search,  we  will  use  the  cyclical  order  for  simplification.  We  benefit 
from  the  possibility  of  computing  the  subgradient  of  the  objective  function,  as  defined 
in  (III. 9),  to  perform  line  search  in  each  coordinate  direction. 

We  now  formally  describe  the  coordinate  descent  method. 

Algorithm  CDM: 

Step  0.  Start  with  an  initial  guess  c°  G  JRm. 

Set  the  cycle  counter  k  1. 

Step  1.  Choose  coordinate  1  and  compute  c\  G  argminCi  f(c\,  ck  1, . . . ,  ckn  1). 

Step  2.  Choose  coordinate  2  and  compute  ck  G  argminC2  f(c\,  C2, _ ,  1). 

Step  m.  Choose  coordinate  m  and  compute  G  argminCm  f(ck,  ck, . . . ,  ckm  1 ) . 

Replace  cycle  k  by  k  +  1  and  go  to  Step  1. 

This  algorithm  terminates  according  to  the  threshold  tolerance  e  >  0,  inputted 
by  the  decision  maker.  For  simplicity,  we  use  the  formula  /(c^1)  —  f(ck )  <  e  as  our 
stopping  criteria. 

4.  Cutting  Plane  Method 

We  finish  Section  III.B  by  describing  the  third  algorithm  we  implement  in  the 
numerical  examples,  in  Chapter  IV:  the  cutting  plane  method,  which  is  guaranteed 
to  achieve  an  optimal  solution  if  one  exists. 

The  idea  behind  this  algorithm  is  to  solve  an  approximate  linear  program 
each  iteration.  The  cutting  plane  method  starts  off  with  our  original  unconstrained 


63 


problem  and  with  every  iteration  we  obtain  a  cut  to  the  feasible  region  that  we  add 
as  a  new  constraint  for  the  following  linear  program.  So  it  approximates  the  feasible 
region  by  a  finite  set  of  closed  half  spaces  and  solves  a  sequence  of  approximating 
linear  programs  until  the  optimal  solution  is  found.  As  we  notice,  the  size  of  the 
linear  program  grows  with  the  number  of  iterations  and  becomes  rather  slow  for  a 
larger  number  of  variables. 

The  cutting  plane  method  is  usually  used  in  integer  or  mixed  integer  linear 
programming  problems  but  is  also  very  popular  when  applied  to  convex  minimiza¬ 
tion  problems  whenever  the  objective  function  value  and  its  subgradient  are  easily 
computed,  as  we  describe  in  detail  below.  Consider  our  minimization  problem 


min 

ceRm 


/(c) 


litz°  W(i,Q?(c)W-)XZoMJ. 


Using  V/(c°),  see  Equation  (III. 9),  at  an  initial  guess  c°  G  Mm,  we  are  able  to  build 
a  relaxation  to  our  problem,  as  follows 


min  £ 

be 

s.t.  / (c°)  +  Vf  (c°)T  (c  —  c°)  <  £,  (III. 11) 

with  £  e  JR  being  a  dummy  variable.  If  we  keep  adding  a  new  constraint  per  it¬ 
eration  k,  as  in  (III. 11),  but  now  applied  to  the  obtained  optimal  solution  cfc_1,  we 
construct  the  linear  programming  problem  with  K  constraints,  where  K  denotes  the 
total  number  of  iterations, 


min  £ 

£,c 

s.t.  / (ck)  +  V/ (ck)T  (c  —  ck)  <  £,  k  —  0, . . .  ,K. 


We  now  formally  state  the  cutting  plane  method. 
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Algorithm  CPM: 


Step  0.  Start  with  an  initial  guess  c°  G  Mm.  Set  k  :=  0. 

Step  1.  Compute  f(ck)  and  V/(cfc),  using  Equations  (III. 8)  and 
(III. 9),  respectively. 

If  V/(cfc)  =  0,  then  ck  is  an  optimal  solution,  stop. 

Step  2.  Solve  the  Linear  Program 
min  £ 

£,c 

s.t.  f(c?)  +  V/(ci)T(c-ci)  <  £,  i  —  0, . . .  ,k. 
Step  3.  Get  obtained  optimal  solution  c  from  Step  2  and  set  ck+1  =  c. 
Replace  k  by  k  +  1  and  go  to  Step  1. 


In  the  next  section,  we  compare  computational  performances  of  the  algorithms 
we  present  in  CHapters  III. A  and  III.B.  We  also  compare  these  complexity  results 
with  least  squares  and  quantile  regression  in  order  to  understand  how  good  these 
presented  computational  methods  are. 

C.  COMPLEXITY 

In  the  previous  two  sections  we  present  different  computational  methods  for 
the  superquantile  regression  problem.  When  implementing  these  methods,  it  is  useful 
to  know  how  efficient  and  costly  they  are.  In  this  section,  we  compare  primal  ver¬ 
sus  dual  methods  in  terms  of  worst-case  complexity,  and  analyze  the  computational 
performances  of  least  squares  and  quantile  regressions. 

1.  Least  Squares  Regression 

In  the  case  of  least  squares  regression  we  have  a  system  with  v  linear  equa¬ 
tions,  due  to  the  v  observations  in  the  data  set,  and  m  +  1  unknown  coefficients, 
(c0,ci, . . . ,  cm).  We  let  X  be  a  design  matrix  of  dimension  v  by  (m  +  1),  with  all 
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elements  in  the  first  column  being  set  equal  to  1  in  order  for  us  to  be  able  to  include 
the  intercept  Co  in  the  regression  model. 

Then  the  best  fitting  coefficients  (co,  c)  are  the  ones  obtained  by  solving  the 
quadratic  minimization  problem 


min  ) 

(c0,c)e-Rm+1  “j' 


s/‘ 

3= 1 


Vcj 


and,  in  matrix  notation,  are  equal  to 


5 


(c0,c)  =  (XTX)~1XTy. 


(III. 12) 


In  terms  of  computational  cost  this  algorithm  implies:  multiplying  XT  by  X,  which 
takes  0(v(m  +  l)2)  arithmetic  operations;  multiplying  XT  by  y,  which  takes  another 
0(v(m+  1))  arithmetic  operations;  computing  the  LU  factorization  of  (A"TX),  which 
takes  another  0((m  +  l)3)  arithmetic  operations;  and  finally  multiplying  (XTX)~1 
by  ( XTy ),  which  takes  0((m  +  l)2).  So  overall  the  running  time  of  this  procedure  is 
0(um2),  assuming  of  course  that  v  >  m  and  X  is  a  full  rank  matrix. 


2.  Quantile  Regression 

As  discussed  in  Subsection  II. A. 3,  the  quantile  regression  function  is  obtained 
by  minimizing  the  (scaled)  Koenker-Bassett  error  measure  (Koenker,  2005).  This 
problem  can  be  rewritten  as  a  linear  program  as  follows 


min  a  lj  u  +  (1  —  a)  lj  v 

co,c,u,v 

s.t.  Co  +  (c,  h(x*))  +  Ui  -  Vi 

— 

y\ 

c0 

G 

M 

c 

G 

JRm 

U  =  (ill ,  ...,UU) 

G 

Ru 

V  =  (ui, . .  .,vu) 

G 

Ru, 

where  lj  denotes  a  transposed  ^-dimensional  vector  of  ones.  This  linear  program  has 
a  total  of  2v  T  m  T  1  number  of  variables  and  v  number  of  equality  constraints.  For 
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us  to  be  able  to  proceed  with  the  computational  performance  analysis,  we  need  to 
transform  the  problem  into  standard  form.  Summarizing  we  then  have  2(2 u  +  m  +  1) 
variables  and  v  equality  constraints. 

Solving  this  linear  program  by  means  of  an  interior  point  method  takes  0((4z/+ 
2m  +  2)3  5)  operations  to  produce  a  solution.  The  path  following  algorithm  is  one 
of  such  interior  point  methods.  Monteiro  and  Adler  (1989)  refined  the  path  follow¬ 
ing  algorithm  to  converge  in  0(\j2(2v  +  m  +  1)  log(e0/e))  iterations  by  reducing  the 
duality  gap  from  e0  to  e,  with  O  ((4 v  +  2m  +  2)3)  arithmetic  operations  per  iteration. 

The  quantile  regression  implementation  takes  a  total  of  O  (z/3-5  log(eo/e)),  as¬ 
suming  that  v  is  larger  than  m.  Specialized  algorithms  (see  for  example  Koenker, 
2005)  improve  on  this  solution  approach,  but  further  discussions  are  beyond  the  scope 
of  this  dissertation. 

3.  Superquantile  Regression  —  Primal  Methods 

Let  us  start  with  the  analytical  integration  presented  in  Subsection  III.A.l.  We 
determine  the  computational  performance  of  this  method  when  the  resulting  linear 
program  is  solved  using  an  interior  point  method. 

In  order  to  determine  the  computational  performance  of  problem  p,  we 
need  to  transform  p  into  a  standard  form  linear  programming  problem.  After  this 
transformation,  we  have  2  \{y  —  ua)(u  +  1)  +  m  +  1]  +  v  variables  and  (u  —  ua)u  +  v 
equality  constraints.  Since  va  =  [net],  with  a  being  usually  close  to  1,  va  is  almost 
as  big  as  the  number  of  observations  v  in  the  data  set. 

As  done  with  the  computational  performance  in  the  quantile  regression  case, 
we  use  the  convergence  results  we  find  in  Monteiro  and  Adler  (1989).  The  primal 
method  using  analytical  integration  takes  a  total  of  O  ( v '  log(e0/e)). 

Let  us  now  turn  to  the  numerical  integration  method  described  in  Subsection 
III. A. 2.  Problem  is  a  linear  program  with  m  +  fi  +  1  +  u(/i  +  1)  variables  and 

2is(n+l)  inequality  constraints.  After  transforming  a  standard  form  linear 

program,  we  have  2(/iu  +  /j,  +  u  +  m  +  l)  variables  and  (/i  +  l)v  equality  constraints 
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in  the  primal  method  with  numerical  integration.  Since  the  number  of  observations 
v  and  integration  subintervals  /i  are  both  usually  large  numbers  and  we  disregard 
the  inputted  weights  in  our  complexity  analysis,  the  implementation  of  the  primal 
methods  for  superquantile  regression  takes  a  total  of  O  (/r3"5z/3-5  log(e0/e)). 


4.  Superquantile  Regression  —  Dual  Methods 

We  now  compare  the  computational  performance  of  the  dual  methods.  Since 
in  the  numerical  examples  we  implement  the  subgradient  method  using  a  constant 
stepsize  rule,  we  analyze  the  computational  performance  of  this  algorithm  under  this 
circumstance. 


Let  d(c°)  =  minseR™  ||c°  —  c||  be  the  distance  between  the  initial  guess  c°  and 
the  optimal  solution  c.  And  let  {cfc}  be  the  sequence  generated  by  the  subgradient 
method,  with  the  stepsize  5k  fixed  at  some  positive  constant  5,  with  k  G  {1,2 , ,K}. 

Then,  according  to  Proposition  6.3.3  in  Bertsekas  (2009),  for  any  scalar  e  >  0, 
we  have  that 

So I  tz 

min  f(ck)  <  /(c)  H - ,  (III.  13) 

o <k<K J  y  ’  ~  J  K  J  2  K  ’ 


where  K  is  given  by 


K  = 


d{d 


,0^2 


Se 


(III. 14) 


with  u  being  the  upper  bound  on  the  norm  of  V/(cfc)  G  d /(cfc),  V/c  >  0.  The  number  of 
iterations  is  independent  of  the  number  of  variables  in  the  problem.  The  most  costly 
operation  of  this  algorithm  in  our  case  is  the  computation  of  V/(cfc)  at  any  given 
iteration  k.  Since  the  vector  Z0(ck)  needs  to  be  sorted  in  order  to  compute  V/(cfc), 
as  stated  in  equation  (III. 8),  the  subgradient  method  takes  0{y  logzz)  operations  per 
iteration.  Note  that  by  establishing  5  —  e/u,  we  can  obtain  an  e-optimal  solution  in 
0(l/e2)  iterations.  So  the  subgradient  method  takes  a  total  of  0({l/e2)v  log  v). 

We  note  that  we  present  the  complexity  result  for  the  slowest  of  the  described 
dual  methods.  Implementing  the  Nesterov’s  optimal  method  (see  Nesterov,  1983) 
improves  the  obtained  result  for  a  total  of  0(( l/e)v\ogv). 


These  results  show  that  dual  methods  are  not  much  slower  than  solving  for 
least  squares  regression  and  such  a  conclusion  is  promising  for  superquantile  regres¬ 
sion. 

In  the  next  chapter  we  present  a  series  of  numerical  examples  that  allow  us  to 
compare  runtimes  of  the  various  algorithms. 
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IV. 


NUMERICAL  EXAMPLES 


In  this  chapter,  we  illustrate  superquantile  regression  in  several  numerical  ex¬ 
amples.  We  start  with  a  simple  example  that  allows  us  to  compare  computational 
methods  in  terms  of  runtimes,  solution  vectors  and  function  values.  Then  we  ap¬ 
ply  superquantile  regression  to  the  well-known  data  sets,  Engel  data  and  Brownlee 
stack  loss  plant  data,  and  compare  the  obtained  superquantile  regression  models  to 
least  squares  and  quantile  regression  functions.  In  the  fourth  example,  we  apply  su¬ 
perquantile  regression  to  an  investment  analysis  problem  taken  from  a  case  study 
of  the  Portfolio  Safeguard  documentation  (American  Optimal  Decisions,  Inc.,  2011). 
The  fifth  and  sixth  examples  address  military  applications,  the  first  concerning  U.S. 
Navy  helicopter  pilots  and  the  second  Portuguese  Navy  submariners,  and  in  both 
examples  their  mission  employment.  We  then  show  an  example  that  arises  in  uncer¬ 
tainty  quantification  of  a  rectangular  cross  section  of  a  short  structural  column  under 
uncertain  yield  stress  and  uncertain  loads.  Finally  we  revisit  the  first  example  in  or¬ 
der  to  address  the  issue  of  superquantile  tracking.  We  experiment  different  regression 
models.  We  compare  the  obtained  solution  vectors,  coefficients  of  determination  and 
adjusted  coefficients  of  determinations,  and  implement  Cook’s  distances  applied  to 
superquantile  regression. 

Computations  are  mostly  carried  out  in  Matlab  version  7.14  on  a  2.26  GHz 
laptop  with  8.0  GB  of  RAM  using  Windows  7.  However  we  implement  both  least 
squares  and  quantile  regression  in  R  programming  language  (R  Development  Core 
Team,  2008).  Specifically  for  solving  the  superquantile  regression  problem  with  a 
numerical  integration  scheme  ,  we  use  Portfolio  Safeguard  in  Matlab,  by  Amer¬ 
ican  Optimal  Decisions,  Inc.  (2011),  with  VAN  as  the  optimization  solver.  Since  we 
assume  the  subintervals  are  equally  likely  when  solving  for  the  primal  method  using 
numerical  integration  schemes,  from  now  on  we  denote  problem  P((j(Jmu'  by  P^m  in¬ 
stead,  and  assume  wy  =  1/ id,  for  i  =  0, 1,  ...,/x.  When  solving  the  primal  method  with 
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analytical  integration,  P^p,  and  the  dual  method,  Dv,  we  employ  GAMS  version  23.7 
with  the  CPLEX  12.3  solver. 

A.  COMPUTATIONAL  COST 

We  start  by  considering  a  loss  random  variable 

Y  =  Ad  +  X2e, 

where  e  is  a  standard  normal  random  variable  and  X  =  (A| ,  X2)  is  uniformly  dis¬ 
tributed  on  [—1, 1]  x  [0, 1],  with  e,  Ad,  and  X2  independent.  We  consider  a  regression 
function  of  the  form  f(x)  =  c0  +  C\X\  +  c2x2  and  set  a  =  0.90.  This  simple  example 
serves  as  a  comparison  tool  between  computational  methods  and  their  performances, 
as  well  as  the  obtained  approximate  solution  vectors,  since  we  know  the  conditional 
superquantile,  which  in  this  case  is  (co,  ci,  c2)  =  (0, 1, 1.755).  The  detailed  explanation 
can  be  seen  in  Section  IV. H.  We  give  an  initial  guess  (c^c®)  =  (0,0)  when  imple¬ 
menting  the  dual  methods,  and  c0  is  consecutively  computed  utilizing  the  regression 
vector  (ci,c2)  obtained  by  the  implemented  algorithm. 

We  first  examine  the  computational  effort  required  to  obtain  an  approximate 
regression  vector.  Table  1  shows  computing  times  for  solving  problem  P£ p  f°r  increas¬ 
ingly  larger  sample  sizes  v  obtained  by  independent  draws  from  (e,Ad,Ad).  While 
the  results  correspond  to  single  instances  of  Pp p,  the  times  vary  little  between  two 
instances  of  the  same  size  and  the  computing  times  are  therefore  representative.  As 
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Table  1.  Example  A:  Computing  times  (sec.)  for  solving  Pp p  f°r  increasingly  larger 
sample  sizes  is. 


predicted  from  the  theoretical  results  discussed  at  the  end  of  Subsection  III.A.l,  the 
computing  time  grows  rapidly  as  the  sample  size  is  increases.  In  addition  to  the 
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inconvenience  of  long  computing  times,  memory  requirements  become  problematic. 
Therefore  solving  Pp p  for  large  sample  sizes  is  challenging,  if  not  impossible,  and  we 
examine  alternative  approaches. 

Second,  we  consider  the  alternative  primal  method  approach  based  on  solving 
-^Num-  While  this  approach  introduces  a  numerical  integration  error,  Proposition  III.  1 
ensures  that  the  error  is  negligible  for  large  //.  In  fact,  as  we  see  next  empirically, 
moderately  large  /x  suffices  for  probability  levels  a  close  to  one.  Moreover,  the  sub¬ 
stantial  reduction  in  problem  size,  as  compared  to  that  of  Ppp,  reduces  computing 
times  dramatically. 

Since  q^(Z^(c))  may  be  nonsmooth  as  a  function  of  (3,  standard  numerical 
integration  error  bounds  may  not  apply.  However,  since  q^Z^c))  is  continuous  and 
nondecreasing  as  a  function  of  (3,  the  use  of  left-endpoint  and  right-endpoint  numerical 
integration  rules  in  P^m  provide  lower  and  upper  bounds  on  the  optimal  value  of 
DSqRu,  respectively.  Table  2  shows  solution  vectors  (c0,ci,c2)  for  ytx  =  100,  ytx  = 
1000,  and  ytx  =  5000  integration  subintervals,  when  we  implement  left-endpoint,  right- 
endpoint,  and  Simpson’s  numerical  integration  schemes,  for  sample  sizes  of  v  =  100, 
v  =  1000,  v  =  10000,  and  v  =  100000. 

For  v  =  100,  we  notice  that  the  solutions  are  insensitive  to  the  numerical 
integration  rule  as  well  as  the  subintervals  ytx  specific  to  the  integration  scheme.  The 
obtained  solutions  are  essentially  identical  to  the  regression  vector  obtained  from  Ppp] 
see  Row  2  of  Table  2.  Here  the  superquantile  coefficient  of  determination  is  R q  90  = 
0.5683  for  all  the  presented  cases,  including  Ppp,  which  also  supports  the  fact  that 
the  numerical  integration  rule  does  not  affect  the  obtained  solution.  Each  one  of  the 
solutions  of  P^um  for  v  =  100  is  obtained  quickly,  in  about  0.08  to  8  seconds  for 
yU  =  100,  yfx  =  1000,  and  ytx  =  5000;  see  the  last  column  of  Table  2.  In  this  case,  we 
clearly  notice  that  y u  =  100  suffices  and  takes  less  than  a  tenth  of  a  second  to  run. 

When  we  increase  the  sample  size  is,  we  start  to  notice  that  the  solution  vectors 
are  slightly  different  but  the  magnitudes  of  these  differences  are  small  for  subintervals 
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Problem 

Integration  Rule 

V 

R 

Co 

Cl 

c2 
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1000 
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Table  2.  Example  A:  Solution  vectors  and  computing  times  (sec.)  for  varying  number 
of  observations  u,  integration  rules  for  solving  as  well  as  number  of  integration 

subintervals  fi. 
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of  /x  =  100  and  //  =  1000.  For  numerical  integration  scheme  implementations  with 
/j  =  5000,  these  differences  are  almost  inexistent,  but  the  computing  times  are  larger. 
Therefore  the  statistician  should  take  this  into  consideration  when  selecting  the  num¬ 
ber  of  subintervals  for  the  numerical  integration  scheme.  It  is  a  tradeoff  between 
obtaining  better  solutions  versus  computing  times.  Also  we  notice  there  is  another 
issue  we  encounter  when  solving  superquantile  regression  problems  for  sample  sizes 
as  large  as  100000  observations.  In  Rows  31-39  of  Table  2,  we  intentionally  include 
the  solution  vectors  for  v  =  100000  using  the  same  number  of  subintervals  as  imple¬ 
mented  in  the  other  cases.  The  discrepancies  in  solution  vectors  are  consequence  of 
rounding  errors  and  we  refer  to  Borges  (2011)  for  further  details. 

One  detail  that  is  not  included  in  Table  2  is  the  coefficient  of  determination 
R2ogo.  For  all  the  presented  cases,  the  coefficient  of  determination  takes  the  values  of 
0.4222,  0.3917,  and  0.1029,  for  sample  sizes  v  =  1000,  v  =  10000,  and  v  =  100000, 
respectively.  We  notice  that  Aq  90  decreases  as  we  increase  the  size  of  the  data  sample, 
which  means  that  the  linear  model  f(x )  =  c0  +  C\Xi  +  c2x2  does  not  fully  capture  the 
variability  of  the  data,  as  expected,  and  a  study  of  other  models  may  be  warranted. 
However,  we  omit  such  an  investigation. 

As  discussed  in  Chapter  III,  the  dual  method  is  another  approach  to  solve  the 
deviation-based  superquantile  regression  problem  which  theoretically  demonstrates 
potential  for  large  sample  sizes.  Since  numerical  integration  not  only  introduces  a 
numerical  integration  error  but  also  takes  increasingly  longer  to  run  for  increasing 
sample  sizes,  we  proceed  with  the  implementation  of  the  dual  methods. 

Third,  we  solve  the  superquantile  regression  problem  by  implementing  the  dual 
methods  of  Section  B  in  Chapter  III,  i.e.,  subgradient,  coordinate  descent,  and  cutting 
plane  methods.  Since  defining  the  stepsize  and  tolerance  for  the  three  algorithms,  as 
well  as  the  maximum  number  of  iterations  in  the  specific  case  of  the  subgradient 
method,  can  be  a  difficult  process,  we  establish  the  following  input  parameters  for 
each  algorithm  as  a  natural  choice  for  us  to  be  able  to  compare  all  three  methods.  We 
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note  that  refining  these  parameters  as  well  as  implementing  more  efficient  algorithms 
could  return  even  better  computing  times  but  such  an  investigation  is  not  the  purpose 
here.  Our  goal  with  this  example  is  to  demonstrate  the  potential  for  dual  methods. 
For  the  subgradient  method,  we  fix  the  stepsize  to  a  constant  value,  5  =  0.1,  and  run 
the  algorithm  for  1000  iterations.  In  the  case  of  the  coordinate  descent  method,  we 
include  a  tolerance  of  10~12  and  define  1000  as  the  maximum  number  of  iterations. 
We  implement  the  cutting  plane  method  with  a  maximum  of  1000  cuts,  and  a  gap  of 
10~8. 

Table  3  shows  the  computing  times  needed  for  solving  problem  Du  for  increas¬ 
ingly  large  sample  sizes  v  implementing  these  algorithms.  Here  the  computing  times 
are  also  representative,  for  the  same  reason  as  in  Table  1.  As  expected,  the  subgradi- 


V 

Computing  Times 

Subgradient 

Coordinate  Descent 

Cutting  Plane 

100 

0.13 

0.67 

0.91 

1000 

0.34 

1.20 

2.07 

5000 

1.43 

0.70 

0.95 

10000 

2.88 

0.88 

3.00 

25000 

5.96 

3.24 

2.20 

50000 

11.92 

8.30 

1.73 

75000 

21.53 

13.78 

1.98 

100000 

27.38 

19.20 

1.78 

Table  3.  Example  A:  Computing  times  (sec.)  for  solving  Dv  using  different  imple¬ 
mentations  of  the  dual  methods  for  increasing  sample  sizes  v. 


ent  method  is  the  slowest  of  the  three  algorithms  for  almost  all  the  presented  cases, 
especially  for  sample  sizes  greater  than  1000  observations.  In  all  the  described  dual 
methods  and  empirically,  the  computing  times  grow  linearly  with  the  sample  size  z/, 
with  the  cutting  plane  method  having  the  smallest  slope  of  the  three,  as  shown  in 
Figure  6. 

In  Figure  7,  we  picture  the  computing  times  for  primal  versus  dual  methods, 
in  logarithmic  scale.  Here  we  choose  to  present  the  Simpson’s  integration  rule  with 
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Figure  6.  Example  A:  Computing  times  for  solving  Du  with  three  different  algorithms 
(subgradient,  coordinate  descent,  and  cutting  plane  methods),  for  increasing  sample 
sizes  is. 

H  =  1000  subintervals  and  the  dual  methods  algorithms  with  the  input  parameters 
as  stated  before.  We  clearly  notice  that  implementing  the  cutting  plane  method 
improves  the  computational  performance,  especially  for  large  sample  sizes.  Also,  for 
larger  samples  sizes  and  smaller  probability  levels  a,  we  certainly  need  to  increase 
the  number  of  integration  subintervals  fi. 

We  also  compare  the  obtained  solution  vectors  and  corresponding  objective 
function  values;  see  Table  4.  Again  we  note  that  it  is  not  possible  to  solve  Ppp  for 
sample  sizes  larger  than  is  =  1000.  We  use  Simpson’s  rule  with  /i  =  1000  intervals  as 
the  numerical  integration  scheme  for  all  sample  sizes.  We  realize  that  the  obtained 
solution  vectors  are  nearly  identical. 

Finally  we  analyze  how  changing  the  probability  level  a  and  the  number  of 
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Figure  7.  Example  A:  Primal  versus  dual  methods  computing  times  for  increasing 
sample  sizes  v,  in  logarithmic  scale. 

observations  v  affect  the  computational  performance  of  the  implemented  algorithms. 
Obviously,  the  primal  methods  are  affected  by  changes  in  the  probability  level  a  since 
the  integral  in  problem  DSqRv  is  defined  between  a  and  1.  The  smaller  the  value  of  a, 
the  smaller  ua  =  \voi\  gets,  and  consequently  the  number  of  variables  and  inequality 
constraints  in  problem  Ppp  increases  due  to  the  increased  difference  (u  —  ua).  In 
the  numerical  integration  schemes,  the  smaller  a  gets,  the  more  subintervals  /r  are 
required  to  obtain  same  accuracy. 

As  we  can  observe  in  Table  5,  the  sample  size  v  influences  the  computing  times 
of  the  subgradient  method,  but  the  probability  level  a  does  not  produce  such  an  effect. 
We  note  that  the  computing  times  for  the  sample  sizes  v  =  100  and  v  =  1000  are  not 
exactly  the  same  for  all  the  presented  probability  levels  a.  These  values  differ  in  the 
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Computational 

Method 

V 

h 

c0 

Cl 

c2 

Function 

Value 

Time 

Analytical  Int. 

100 

NA 

0.0630 

1.0951 

1.5841 

0.844477 

0.05 

Numerical  Int. 

100 

1000 

0.0630 

1.0951 

1.5841 

0.844477 

0.77 

Subgradient 

100 

NA 

0.0673 

1.1004 

1.5699 

0.844575 

0.13 

Coord.  Descent 

100 

NA 

0.0631 

1.0952 

1.5839 

0.844478 

0.67 

Cutting  Plane 

100 

NA 

0.0630 

1.0951 

1.5841 

0.844477 

0.91 

Analytical  Int. 

1000 

NA 

0.0680 

1.0108 

1.7322 

1.049276 

174.24 

Numerical  Int. 

1000 

1000 

0.0680 

1.0108 

1.7322 

1.049276 

1.21 

Subgradient 

1000 

NA 

0.0680 

1.0109 

1.7321 

1.049276 

0.34 

Coord.  Descent 

1000 

NA 

0.0680 

1.0109 

1.7321 

1.049276 

1.20 

Cutting  Plane 

1000 

NA 

0.0680 

1.0109 

1.7320 

1.049276 

2.07 

Analytical  Int. 

10000 

— 

— 

— 

— 

— 

— 

Numerical  Int. 

10000 

1000 

0.0818 

1.0048 

1.6430 

1.092066 

5.27 

Subgradient 

10000 

NA 

0.0818 

1.0048 

1.6429 

1.092033 

2.88 

Coord.  Descent 

10000 

NA 

0.0834 

1.0047 

1.6378 

1.092040 

0.88 

Cutting  Plane 

10000 

NA 

0.0817 

1.0049 

1.6432 

1.092033 

3.00 

Table  4.  Example  A:  Solution  vectors  and  computing  times  (sec.)  for  the  superquan¬ 
tile  regression  problem  with  varying  computational  methods,  and  sample  sizes  v. 


Dual 

Method 

a 

V 

Co 

Cl 

c2 

Function 

Value 

Time 

100 

-0.0478 

1.1038 

0.6420 

0.502796 

0.14 

0.25 

1000 

-0.0557 

0.9988 

0.6726 

0.584710 

0.33 

10000 

-0.0398 

0.9990 

0.6048 

0.608587 

2.61 

Subgradient 

100 

0.0163 

1.0901 

0.8440 

0.598695 

0.14 

0.50 

1000 

0.0309 

0.9943 

0.8822 

0.705208 

0.33 

Method 

10000 

0.0390 

1.0014 

0.8239 

0.732942 

2.99 

100 

0.0390 

1.1029 

1.2056 

0.729180 

0.14 

0.75 

1000 

0.0762 

0.9976 

1.2239 

0.875049 

0.33 

10000 

0.0742 

1.0009 

1.2050 

0.905114 

2.93 

Table  5.  Example  A:  Solution  vectors  and  computing  times  (sec.)  for  solving  Du 
when  implementing  the  subgradient  method  with  varying  probability  levels  a  and 
number  of  observations  v. 
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third  decimal  places,  which  makes  the  magnitude  of  such  differences  negligible. 


Table  6  presents  the  solution  vectors  and  computing  times  for  the  coordinate 
descent  method  for  different  probability  levels  a  and  sample  sizes  v.  Similarly  to  the 
subgradient  method,  we  realize  that  only  the  sample  size  v  has  a  significant  effect  on 
the  computing  times. 


Dual 

Method 

a 

V 

Co 

Cl 

c2 

Function 

Value 

Time 

100 

-0.0478 

1.1038 

0.6419 

0.502796 

0.62 

0.25 

1000 

-0.0227 

0.9995 

0.5937 

0.585203 

0.16 

Coordinate 

10000 

-0.0392 

0.9990 

0.6034 

0.608588 

2.51 

100 

0.0163 

1.0901 

0.8439 

0.598695 

0.65 

Descent 

0.50 

1000 

0.0340 

0.9943 

0.8734 

0.705218 

0.21 

10000 

0.0403 

1.0014 

0.8205 

0.732944 

1.58 

Method 

100 

0.0390 

1.1029 

1.2056 

0.729180 

0.70 

0.75 

1000 

0.0771 

0.9977 

1.2213 

0.875050 

0.21 

10000 

0.0763 

1.0009 

1.1987 

0.905121 

1.03 

Table  6.  Example  A:  Solution  vectors  and  computing  times  (sec.)  for  solving  Dv 
when  implementing  the  coordinate  descent  method  with  varying  probability  levels  a 
and  number  of  observations  u. 


Dual 

Method 

a 

V 

Co 

Cl 

c2 

Function 

Value 

Time 

100 

-0.0479 

1.1038 

0.6420 

0.502797 

1.13 

0.25 

1000 

-0.0556 

0.9989 

0.6723 

0.584710 

1.51 

Cutting 

10000 

-0.0399 

0.9989 

0.6049 

0.608587 

2.90 

100 

0.0162 

1.0901 

0.8440 

0.598695 

2.23 

Plane 

0.50 

1000 

0.0307 

0.9944 

0.8827 

0.705208 

1.21 

10000 

0.0389 

1.0017 

0.8243 

0.732943 

1.22 

Method 

100 

0.0390 

1.1029 

1.2056 

0.729180 

0.80 

0.75 

1000 

0.0763 

0.9976 

1.2238 

0.875049 

1.97 

10000 

0.0739 

1.0008 

1.2059 

0.905114 

1.16 

Table  7.  Example  A:  Solution  vectors  and  computing  times  (sec.)  for  solving  Du 
when  implementing  the  cutting  plane  method  with  varying  probability  levels  a  and 
number  of  observations  v. 
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A  different  result  is  obtained  when  we  implement  the  cutting  plane  method, 
as  shown  in  Table  7.  The  computing  time  differences  are  not  significant  in  any  of  the 
cases.  Here  we  run  the  cutting  plane  method  with  bounds  on  the  vectors  (c*,  c*),  for 
iteration  k.  Decreasing  these  bounds  by  making  them  more  restrictive,  and  reducing 
the  maximum  number  of  cuts  that  the  algorithm  can  add,  reduces  the  computing 
times  shown  by  Column  8  in  Table  7,  and  the  magnitudes  of  the  computing  times 
differences  become  even  less  significant. 

Out  of  curiosity,  if  we  implement  the  subgradient  method  for  this  example  with 
ten  times  more  iterations,  i.e.,  a  total  of  10000  iterations,  and  reduce  the  stepsize  to 
5  =  0.01,  we  find  that  the  solution  vectors  are  exactly  the  same  as  the  ones  presented 
in  Table  5,  with  the  same  objective  function  values,  but  the  computing  times  increase 
by  at  least  a  factor  of  10  in  the  cases  of  u  —  10000  and  v  =  100000,  a  factor  of  18  for 
v  =  1000,  and  a  factor  of  30  for  v  =  100.  This  shows  how  important  the  selection  of 
the  right  stepsize  and  maximum  number  of  iterations  is. 

From  this  example  we  conclude  that  for  small  sample  sizes  it  is  beneficial  to 
run  the  primal  method  using  analytical  integration,  since  we  obtain  the  exact  solution 
vector  and  the  computing  times  are  not  drastically  higher  than  solving  Dv .  As  the 
sample  size  increases,  the  results  show  that  we  should  rely  on  the  dual  method  and 
implement  the  cutting  plane  method  or  any  other  algorithm  that  is  comparable  to 
the  cutting  plane  method.  Another  aspect  we  observe  is  the  fact  that  the  probability 
level  a  does  not  produce  any  visible  effect  on  the  dual  methods  computing  times.  To 
the  contrary,  the  primal  methods,  with  analytical  or  numerical  integration  schemes, 
are  clearly  affected  due  to  changes  in  a  since  the  integral  interval  is  adjusted  accord¬ 
ingly  and  the  number  of  variables  changes.  Implementing  the  primal  methods  with 
numerical  integration  schemes  implies  the  wise  selection  of  the  number  of  subintervals 
fi  according  to  the  sample  size  v  and  probability  level  a. 
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B.  ENGEL  DATA 

This  next  example  is  based  on  a  data  set  originally  worked  by  Ernst  Engel  in 
1857,  and  used  by  Koenker  and  Bassett  in  their  regression  quantiles  studies  (Koenker 
&  Bassett  Jr.,  1982).  Engel  presents  this  data  set  to  show  his  hypothesis  that  the 
annual  expenditures  on  food  for  Belgian  working  class  families  increase  less  than  the 
increase  of  their  annual  household  incomes.  In  Koenker  (2005),  the  author  uses  this 
data  set  as  an  example  to  address  the  issues  of  estimating  the  asymptotic  covariance 
matrix  in  statistical  inference  for  quantile  regression  and  estimates  six  quantile  re¬ 
gression  functions  for  probability  levels  a  G  {0.05,  0.10, 0.25,  0.75,  0.90,  0.95}.  For  this 
example,  we  are  interested  in  comparing  these  obtained  quantile  regression  functions 
with  superquantile  regression  functions  at  the  same  probability  levels  a,  and  verify 
how  both  regression  techniques  differ  conceptually. 

We  have  a  data  set  of  235  observations  of  the  income  and  the  expenditure  on 
food  in  Belgian  francs  for  Belgian  working  class  annual  households,  see  Figure  8(a). 
As  done  in  Koenker  (2005),  we  also  transform  both  variables  to  the  logarithmic  scale, 
see  Figure  8(b).  We  seek  to  quantify  the  food  expenditure  Y  and  consider  a  linear 
regression  function  fi(X)  =  cq  +  cX,  where  A"  is  the  explanatory  random  variable 
that  represents  the  household  income  for  Belgian  working  class. 

In  Figure  9(b),  we  observe  the  a-quantile  regression  models,  for  probability 
levels  a  G  {0.05,0.10,0.25,0.50,0.75,0.90,0.95},  in  logarithmic  scale.  Here  we  also 
include  the  least  squares  and  the  0.50-quantilc  regression  functions  for  comparison 
and  highlight  the  obtained  0.75-quantile  regression  function  that  we  use  later  in  this 
example.  Although  some  of  these  quantile  regression  functions  look  parallel,  their 
slopes  are  distinct;  see  Koenker  (2005)  for  further  discussion.  These  slope  differences 
are  more  evident  in  Figure  9(a). 

In  Figure  10,  we  present  the  a-superquantilc  regression  models,  for  different 
probability  levels  a  G  {0.05,0.10,0.25,0.50,0.75,0.90,0.95}.  Again  we  include  the 
least  squares  and  the  0.50-superquantile  regression  functions  and  highlight  the  ob- 
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Household  Income 


(a)  Original  display. 
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(b)  Logarithmic  scale  display. 

Figure  8.  Example  B:  Engel  data  set 


(a)  Original  display. 


(b)  Logarithmic  scale  display. 


Figure  9.  Example  B:  Least  squares  and  quantile  regression  functions,  for  varying  a. 
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tained  0.75-super  quantile  regression  fit. 

An  interesting  detail  shown  in  Figure  9  is  that  the  obtained  quantile  regression 
models  nearly  “span”  the  observations,  i.e. ,  we  have  regression  functions  above  and 
below  the  least  squares  regression  fit.  As  we  observe  in  Figure  10,  the  superquan¬ 
tile  regression  models  for  varying  probability  levels  a  do  not  have  such  property. 
One  would  have  to  change  the  orientation  of  the  original  problem  in  order  to  obtain 
regression  functions  below  the  least  squares  regression  model,  since  %(Y)  =  E[Y], 

In  order  to  compare  the  obtained  regression  vectors  and  the  corresponding 
coefficient  of  determination  for  the  model  f\  (x)  =  Co  +  c\x,  we  refer  to  Table  8.  We 
cosider  the  same  probability  levels  a  as  shown  in  Figures  9  and  10.  Due  to  the  small 
sample  size,  235  observations,  we  solve  the  deviation-based  superquantile  regression 
problem  by  analytical  integration,  Ppp.  We  refer  to  Figure  11(a)  to  show  how  close  the 
quantile  and  superquantile  linear  regression  functions  are  in  the  case  where  a  =  0.75. 

We  now  consider  a  quadratic  model  of  the  form  f2(x)  =  c0  +  C\X  +  c2x2. 
In  Figure  11(b),  we  observe  the  different  quadratic  fits  for  least  squares,  quantile, 
and  superquantile  regressions.  Although  both  graphs  in  Figure  11  show  that  the 
0.75-super  quantile  regression  functions  look  exactly  alike,  Figure  11(b)  actually  has 
a  curvature  that  can  be  noted  using  different  scales  on  the  horizontal  axis.  Table 
8  shows  the  obtained  regression  vectors  (c0,ci,c2)  for  the  quadratic  model,  using 
distinct  regression  techniques.  We  note  that  the  coefficient  of  determination  for  the 
linear  are  slightly  smaller  than  for  the  quadratic  models,  which  means  that  adding 
the  term  c2x2  slightly  improves  the  obtained  results  in  some  sense. 

In  Figure  12,  we  visualize  quantile  and  superquantile  regression  functions  for 
varying  probability  levels  a.  It  is  interesting  to  notice  how  the  quantile  regression  fits 
are  severely  influenced  by  one  observation  where  four  quantile  regression  functions 
cross  each  other  just  below  the  least  squares  fit,  represented  by  the  big  black  dot.  To 
the  contrary,  the  obtained  superquantile  regression  fits  are  not  greatly  influenced  by 
this  observation  and  depict  other  observations. 
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Food  Expenditure  Food  Expenditure 


(a)  Original  display. 


(b)  Logarithmic  scale  display. 


Figure  10.  Example  B:  Least  squares  and  superquantile  regression  functions,  for 
varying  a. 


Regression  Model 

a 

c0 

Cl 

c2 

Rl 

Least  Squares 

NA 

147.475 

0.4852 

— 

0.8304 

0.05 

124.880 

0.3434 

— 

— 

0.10 

110.142 

0.4018 

— 

— 

0.25 

95.4835 

0.4741 

— 

— 

Quantile 

0.50 

81.4822 

0.5602 

— 

— 

0.75 

62.3966 

0.6440 

— 

— 

0.90 

67.3509 

0.6863 

— 

— 

0.95 

64.1040 

0.7091 

— 

— 

0.05 

18.8791 

0.6370 

— 

0.6882 

0.10 

27.0860 

0.6387 

— 

0.6913 

0.25 

45.2404 

0.6425 

— 

0.7043 

Superquantile 

0.50 

52.3684 

0.6657 

— 

0.7322 

0.75 

57.3732 

0.6924 

— 

0.7716 

0.90 

77.4796 

0.7039 

— 

0.8070 

0.95 

88.6620 

0.7097 

— 

0.8223 

Least  Squares 

NA 

8.0060 

0.7100 

-6.603e-5 

0.8671 

0.05 

-31.7001 

0.6815 

-1.295e-4 

— 

0.10 

52.6260 

0.5009 

-2.884e-5 

— 

0.25 

22.8226 

0.6123 

-5.009e-5 

— 

Quantile 

0.50 

5.7593 

0.7243 

-7.198e-5 

— 

0.75 

-26.0488 

0.8378 

-9.360e-5 

— 

0.90 

72.2423 

0.6724 

7.838e-6 

— 

0.95 

44.3764 

0.7445 

-1.419e-5 

— 

0.05 

-28.7584 

0.7354 

-4.243e-5 

0.6903 

0.10 

-13.3480 

0.7212 

-3.498e-5 

0.6928 

0.25 

17.2230 

0.6946 

-1.896e-5 

0.7050 

Superquantile 

0.50 

32.8155 

0.7034 

-1.439e-5 

0.7327 

0.75 

45.6962 

0.7144 

-8.130e-6 

0.7717 

0.90 

54.6966 

0.7461 

-1.467e-5 

0.8079 

0.95 

53.0274 

0.7777 

-2.522e-5 

0.8241 

Table  8.  Example  B:  Solution  vectors  (co,Ci)  and  coefficients  of  determination  for 
the  linear  model  of  the  form  fi(x)  =  Co  +  C\X,  and  solution  vectors  (co,Ci,C2)  and 
coefficients  of  determination  for  the  quadratic  model  of  the  form  /2(x)  =  cq  +  C\X  + 
c2x2. 
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Food  Expenditure 


-  0.75-Superquantile  Regression 

-  0.75-Quantile  Regression 

-  Least  Squares  Regression 

l  l  l 

3000  4000  5000 


Household  Income 


(a)  Linear  model  fi(x)  =  Cq  +  C\X. 


(b)  Quadratic  model  /Rat)  =  Co  +  cite  +  C2X2. 

Figure  11.  Example  B:  Regression  functions  for  linear  and  quadratic  models. 


Food  Expenditure  Food  Expenditure 


(a)  Quantile  regression  for  varying  probability  levels  a. 


(b)  Superquantile  regression  for  varying  probability  levels  a. 


Figure  12.  Example  B:  Least  squares,  quantile,  and  superquantile  regression  functions 
for  the  quadratic  model  /2(a;)  =  Co  +  c\x  +  c^x2- 


As  a  conclusion  to  this  example,  we  note  that  superquantile  regression  brings 
additional  information  concerning  the  tail  realizations  of  our  loss  random  variable. 
The  linear  fits  from  quantile  and  superquantile  regressions  are  close,  with  only  a  slight 
difference  in  slope.  However,  the  quadratic  superquantile  model  provides  a  distinct 
perspective.  In  the  quadratic  case,  quantile  regression  is  highly  affected  in  a  dubious 
manner  by  one  observation. 

C.  BROWNLEE  STACK  LOSS  PLANT  DATA 

This  example  is  based  on  a  data  set  with  21  observations  from  the  Brownlee 
stack  loss  plant  data  set,  which  defines  the  oxidation  of  ammonia  (NH3)  to  nitric  acid 
(HN03)  of  a  plant,  as  described  in  detail  in  Brownlee  (1965). 

We  seek  to  estimate  the  stack  loss  random  variable  Y ,  representing  ten  times 
the  percentage  of  ammonia  going  into  the  plant  that  escapes  from  the  absorption 
tower,  using  three  explanatory  random  variables:  air  flow  (Xaf),  which  represents 
the  rate  of  operation  of  the  plant;  water  temperature  (Xwt),  which  denotes  the  tem¬ 
perature  of  cooling  water  circulated  through  coils  in  the  absorption  tower;  and  acid 
concentration  (Wac),  [per  1000,  minus  500]. 

Figure  13  shows  the  scatterplot  matrix  of  the  stack  loss  data,  where  we  observe 
the  pairwise  correlations.  Here  we  notice  a  linear  correlation  between  stack  loss  and 
air  flow  and  a  polynomial  correlation  between  stack  loss  and  water  temperature.  We 
explore  these  two  possible  models  and  compare  the  obtained  results  with  coefficient 
of  determination  calculations,  as  described  in  Section  II. C. 

We  first  consider  a  linear  model  of  the  form  fi(x)  =  c0  +  caf£af  +  cwt2;wt  +  caca;ac. 
Table  9  shows  the  obtained  regression  coefficients  after  solving  P£ p-  All  the  instances 
of  problem  P^p  take  approximately  one  quarter  of  a  second  to  run  due  to  the  small 
number  of  observations  in  the  data  sample. 

From  Table  9,  we  conclude  that  a  linear  model  with  all  three  explanatory  ran¬ 
dom  variables  is  reasonable.  It  is  interesting  to  note  that  the  resulting  coefficients  of 
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Figure  13.  Example  C:  Stack  loss  data  scatterplot  matrix. 
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Regression 

a 

Co 

Caf 

Cwt 

^ac 

Rl 

D  2 

-H'a.Adj 

Least  Squares 

NA 

-39.9197 

0.7156 

1.2953 

-0.1521 

0.9136 

0.8983 

0.25 

-36.0000 

0.5000 

1.0000 

0.0000 

— 

— 

Quantile 

0.50 

-39.6899 

0.8319 

0.5739 

-0.0609 

— 

— 

0.75 

-54.1897 

0.8707 

0.9828 

0.0000 

— 

— 

0.90 

-58.5433 

0.7930 

1.3054 

0.0382 

— 

— 

0.25 

-55.1432 

0.8056 

1.2037 

0.0000 

0.7478 

0.7033 

Superquantile 

0.50 

-58.6210 

0.7930 

1.3054 

0.0382 

0.7750 

0.7353 

0.75 

-60.1368 

0.7500 

1.4561 

0.0570 

0.8050 

0.7706 

0.90 

-58.4620 

0.5246 

1.8584 

0.1073 

0.8231 

0.7919 

Table  9.  Example  C:  Regression  vectors,  R2a,  and  R2aAd]  for  the  linear  model  f\  which 
includes  all  explanatory  variables,  and  for  different  probability  levels  a. 


a 

0.05 

0.10 

0.15 

0.20 

0.25 

0.30 

0.35 

0.40 

0.45 

K 

0.7384 

0.7402 

0.7423 

0.7447 

0.7478 

0.7516 

0.7563 

0.7618 

0.7682 

a 

0.50 

0.55 

0.60 

0.65 

0.70 

0.75 

0.80 

0.85 

0.90 

K 

0.7750 

0.7818 

0.7883 

0.7944 

0.8001 

0.8050 

0.8110 

0.8173 

0.8231 

Table  10.  Example  C:  Coefficients  of  determination  for  different  probability  levels  a. 

determination  R2a  and  adjusted  coefficients  of  determination  R2a  Adj  for  superquantile 
regression  increase  with  a,  which  lead  us  to  further  experiment  for  various  probability 
levels  a.  Table  10  shows  the  obtained  coefficients  of  determination  for  varying  a. 

We  next  analyze  a  simpler  model,  using  water  temperature  as  the  only  available 
explanatory  variable,  and  compare  the  corresponding  linear  /2(x)  =  c0  +  cwta;wt  and 
quadratic  models  /3(x)  =  Co  +  cwta:wt  +  cwt 2X^3  see  Table  11.  For  the  situation  where 
one  only  has  water  temperature  as  the  explanatory  variable,  applying  the  quadratic 
model  /3  slightly  reduces  the  coefficients  of  determination.  However  we  plot  the 
obtained  regression  functions,  see  Figure  14(b),  and  notice  that  the  quadratic  models 
better  represent  the  data. 

It  is  interesting  to  notice  that  the  0.90-quantile  and  0.90-superquantile  regres¬ 
sion  functions  are  exactly  the  same  in  the  quadratic  model  /3.  This  is  due  to  a  small 


92 


Model 

Regression 

a 

Co 

Cwt 

Cwt2 

Rl 

c>2 

/la,Ad.i 

Least  Squares 

NA 

-41.9109 

2.8174 

— 

0.7665 

0.7542 

0.25 

-32.0000 

2.1667 

— 

— 

— 

Quantile 

0.50 

-47.8571 

3.1429 

_ 

_ 

— 

0.75 

-41.0000 

2.8889 

— 

— 

— 

/a 

0.90 

-42.0000 

3.1111 

— 

— 

— 

0.25 

-43.6667 

3.0000 

— 

0.5649 

0.5420 

Superquantile 

0.50 

-41.7619 

3.0000 

— 

0.5954 

0.5741 

0.75 

-39.1905 

3.0000 

— 

0.6440 

0.6250 

0.90 

-38.0476 

3.0000 

— 

0.6715 

0.6540 

Least  Squares 

NA 

151.5654 

-15.2555 

0.4131 

0.8755 

0.8617 

0.25 

148.6000 

-15.1583 

0.4083 

— 

— 

Quantile 

0.50 

200.8500 

-19.8333 

0.5167 

— 

— 

0.75 

110.1429 

-11.1381 

0.3191 

— 

— 

h 

0.90 

205.5714 

-20.6714 

0.5571 

— 

— 

0.25 

167.5589 

-16.9167 

0.4583 

0.6676 

0.6306 

Superquantile 

0.50 

183.9524 

-18.5000 

0.5000 

0.6884 

0.6538 

0.75 

205.4789 

-20.6714 

0.5571 

0.7490 

0.7211 

0.90 

205.5714 

-20.6714 

0.5571 

0.7792 

0.7546 

Table  11.  Example  C:  Regression  vectors,  K2a)  and  B?a  A dj  for  linear  and  quadratic 
models,  f-2  and  fs,  respectively,  for  varying  probability  levels  a. 


data  set  and  how  the  observations  are  dispersed.  For  example,  here  we  have  three 
observations  at  sample  point  (ad,  yJ)  =  (20, 15).  For  such  a  very  small  data  set,  hav¬ 
ing  coincident  observations  does  not  help  obtaining  better  quantile  or  superquantile 
regression  fits.  We  notice  that  the  0.75-quantile  regression  function  is  a  clear  example 
of  how  having  small  data  sets  aggravated  by  overlapping  observations  influences  the 
obtained  regression  vector  and  may  cause  the  function  to  shift  accordingly.  In  this 
case,  we  realize  that  both  0.75-quantile  and  0.75-superquantile  regression  functions 
cross  the  point  (ad,  t/Q  =  (20,15). 

As  a  conclusion  to  this  example,  we  note  that  small  data  sets  in  superquantile 
regression  are  problematic  to  deal  with.  As  a  thumb  rule,  one  needs  1/(1  —  a)  times 
more  observations  when  performing  superquantile  regression  than  when  in  the  case 
of  least  squares  regression.  Therefore  the  obtained  approximating  regression  vectors 
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(a)  Linear  model  f2{x)  =  Co  +  cwtXwt- 


(b)  Quadratic  model  f3(x)  =  c0  +  cwtxwt  +  cwt2xi,t. 

Figure  14.  Example  C:  Regression  functions  for  linear  and  quadratic  models. 
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for  small  data  sets  should  be  considered  with  care  when  used  in  decision  making 
processes. 

D.  INVESTMENT  ANALYSIS 

The  next  example  is  a  case  study  taken  from  the  “Style  Classification  with 
Quantile  Regression”  documentation  in  Portfolio  Safeguard,  by  American  Optimal 
Decisions,  Inc.  (2011),  and  deals  with  the  negative  return  of  the  Fidelity  Magellan 
Fund  as  predicted  by  the  explanatory  variables  Russell  1000  Growth  Index  (A’rlg), 
Russell  1000  Value  Index  (XRLV),  Russell  Value  Index  (XRUj),  and  Russell  2000 
Growth  Index  (Aruo)-  We  change  the  orientation  from  “return”  to  “negative  return” 
to  be  consistent  with  the  orientation  of  a  loss  random  variable  in  this  dissertation. 
The  indices  classify  the  style  of  the  fund;  see  American  Optimal  Decisions,  Inc.  (2011) 
for  details.  There  are  v  =  1264  total  observations  available. 

We  start  by  considering  a  linear  model  f\  ( x )  =  cq  +  crlg^rlg  +  crlv^rlv  + 
cruj^ruj  +  cruo^ruo  and  compare  the  obtained  approximate  regression  vectors  for 
least  squares,  quantile,  and  superquantile  regression  models  under  a  probability  level 
a  =0.75  and  0.90,  as  shown  in  Rows  2-6  of  Table  12.  DSqRv  is  solved  through 
pNum  with  Simpson’s  rule  as  the  integration  scheme  and  //  =  1000  subintervals,  while 
quantile  regression  is  carried  out  directly  in  Portfolio  Safeguard  Shell  Environment 
(American  Optimal  Decisions,  Inc.,  2011).  Table  12  (last  column)  also  shows  the  cor¬ 
responding  adjusted  coefficients  of  determination.  The  fits  are  good  and  a  majority  of 
the  variability  in  the  data  is  captured.  However,  the  small  values  of  cruo  and  also  the 
corresponding  p- value  from  the  least  squares  regression  point  to  the  possible  merit 
of  dropping  X 'ruo  as  explanatory  variable.  We  from  now  on  focus  on  superquan¬ 
tile  regression.  A  new  model  f2(x)  =  c0  +  crlg^rlg  +  crlv^rlv  +  cruj^ruj  yields 
the  approximate  regression  vectors  of  Table  12  (Rows  7-8),  which  also  shows  the  ob¬ 
tained  adjusted  coefficients  of  determination  R„  Adj.  The  fact  that  we  analyze  R2a  Adj 
instead  of  R2a  enable  us  to  better  compare  fits  across  models  with  different  numbers 
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Model 

Regression 

a 

c0 

CRLG 

crlv 

cruj 

cruo 

E>  2 

na,Adj 

LS 

NA 

0.0010 

-0.5089 

-0.5180 

0.0484 

0.0061 

0.9823 

Quantile 

0.75 

0.0045 

-0.5438 

-0.4518 

0.0159 

0.0173 

— 

fi 

0.90 

0.0089 

-0.5177 

-0.4602 

0.0156 

-0.0001 

— 

Super- 

0.75 

0.0095 

-0.5036 

-0.4723 

0.0192 

0.0009 

0.8731 

quantile 

0.90 

0.0138 

-0.4837 

-0.4912 

0.0223 

-0.0019 

0.8718 

h 

Super- 

0.75 

0.0095 

-0.5028 

-0.4728 

0.0200 

— 

0.8733 

quantile 

0.90 

0.0138 

-0.4855 

-0.4906 

0.0210 

— 

0.8720 

0.75 

0.0137 

-0.8228 

— 

— 

— 

0.7380 

0.90 

0.0218 

-0.8189 

— 

— 

— 

0.7248 

0.75 

0.0321 

— 

-1.0668 

— 

— 

0.5940 

h 

Super- 

0.90 

0.0475 

— 

-1.0727 

— 

— 

0.5702 

quantile 

0.75 

0.0515 

— 

— 

-0.7745 

— 

0.4103 

0.90 

0.0714 

— 

— 

-0.6949 

— 

0.4162 

0.75 

0.0344 

— 

— 

— 

-0.5498 

0.3962 

0.90 

0.0512 

— 

— 

— 

-0.5145 

0.2593 

Table  12.  Example  D:  Approximate  least  squares  (LS),  quantile,  and  superquantile 
regression  vectors  and  R^a dj  f°r  m°dels  /i,  fo ,  and  fy. 


of  explanatory  variables.  In  comparison,  the  fit  improves  slightly  by  dropping  Aruo- 
We  further  reduce  the  model  to  a  single  explanatory  variable,  fa(x)  =  co  +  CiXi, 
with  i  e  {RLG,  RLV,  RUJ,  RUO},  and  examine  the  four  possibilities  in  Rows  9-16 
of  Table  12.  We  find  that  -R^Adj  deteriorates,  but  only  moderately  for  the  model 
Co  +  crlgA^rlg-  This  simple  model  captures  much  of  the  variability  in  the  data  set. 
A  somewhat  poorer  fit  is  achieved  by  Arlv,  which  is  illustrated  in  Figure  15,  for  a  = 
0.90.  That  figure  also  depicts  the  corresponding  quantile  and  least  squares  regression 
lines.  It  is  apparent  that  superquantile  regression  provides  a  distinct  perspective  from 
the  other  regression  techniques  of  potentially  significant  value  to  a  decision  maker. 


E.  U.S.  NAVY  HELICOPTER  PILOTS  DATA 

This  example  considers  the  results  of  an  online  survey  of  winged  Naval  heli¬ 
copter  pilots  of  the  U.S.  Navy;  see  Phillips  (2011)  for  details.  Her  goal  is  to  verify 
if  helicopter  pilots  back  pain  is  a  concern  among  the  helicopter  community  and  to 
define  this  problem’s  implications.  Although  this  is  an  important  and  real  issue  in 
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the  helicopter  community,  we  do  not  use  the  superquantile  regression  technique  de¬ 
veloped  in  this  dissertation  to  estimate  helicopter  pilots’  back  pain  frequency  due  to 
the  categorical  nature  of  this  random  variable.  Instead  we  utilize  the  available  data 
set  to  estimate  the  total  flight  hours  Y  for  naval  helicopter  pilots.  As  explanatory 
variables  we  have  the  number  of  years  a  helicopter  pilot  has  flown  for  the  U.S.  Navy 
(Ayears),  and  their  body  mass  index  (Xbmi),  available  through  a  formula  derived  using 
the  available  data  on  height  and  weight  of  helicopter  pilots. 

Since  we  only  consider  those  pilots  that  answered  questions  in  the  “Demo¬ 
graphics”  and  “Flight  Hour  Info”  sections,  see  Appendix  A  in  Phillips  (2011),  of  the 
648  pilots  that  completed  the  survey,  we  only  use  633  observations.  Figure  16  displays 
these  observations  in  a  pairwise  scatterplot  matrix.  As  expected,  one  clearly  depicts 
the  linear  correlation  between  years  an  helicopter  pilot  has  flown  for  the  U.S.  Navy 
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and  the  estimated  total  number  of  flight  hours. 

We  first  consider  a  regression  function  of  the  form  f(x)  =  Cq  +  cyears%ears  + 
cbmi^bmi  and  vary  the  probability  levels  a.  Rows  2-10  in  Table  13  report  the  obtained 


Model 

Regression 

a 

Co 

Cyears 

cbmi 

Rl 

c>2 

-^c^Adj 

Least  Squares 

NA 

51.70 

161.22 

0.9176 

0.7780 

0.7773 

0.25 

-48.71 

146.67 

0.3418 

— 

— 

A 

Quantile 

0.50 

-55.56 

177.78 

0.0000 

— 

— 

0.75 

0.0000 

200.00 

0.0000 

— 

— 

0.99 

1233.3 

322.49 

-46.565 

— 

— 

0.25 

-47.03 

200.00 

0.000 

0.6094 

0.6081 

Superquantile 

0.50 

2.1827 

208.69 

-0.1809 

0.6205 

0.6193 

0.75 

116.71 

223.21 

-2.6097 

0.6147 

0.6134 

0.99 

244.33 

323.79 

-75.903 

0.4754 

0.4738 

Least  Squares 

NA 

74.84 

161.30 

— 

0.7780 

0.7776 

0.25 

-40.00 

146.67 

— 

— 

— 

f'2 

Quantile 

0.50 

-55.56 

177.78 

_ 

— 

_ 

0.75 

0.0000 

200.00 

— 

— 

— 

0.99 

-93.75 

343.75 

— 

— 

— 

0.25 

-47.03 

200.00 

— 

0.6094 

0.6088 

Superquantile 

0.50 

-1.781 

208.57 

— 

0.6205 

0.6199 

0.75 

49.721 

223.13 

— 

0.6146 

0.6140 

0.99 

247.55 

350.00 

— 

0.4538 

0.4529 

Table  13.  Example  E:  Regression  vectors,  K2a)  and  Ra,A<ij  f°r  m°del  fi(x)  =  Co  + 
Cyears^years  +  cbmi^bmi  and  f2(x)  =  c0  +  cyearsa;years  at  varying  probability  levels  a. 


solution  vectors  for  model  f\ ,  the  corresponding  coefficients  of  determination  R2a , 
and  adjusted  coefficients  of  determination  R^Adj-  The  fits  are  reasonable  but  the 
p- value  for  cbmi  from  the  least  squares  regression  suggests  the  possible  benefit  of 
dropping  A"bmi  as  explanatory  variable.  With  this  in  mind,  we  drop  the  explanatory 
random  variable  A"bmi  from  our  new  model.  Before  we  move  on  to  the  next  model,  we 
notice  that  the  obtained  0. 99-quantile  regression  solution  vector  is  correct  although 
its  intercept  looks  way  larger  compared  to  other  approximate  solution  vectors. 

Second  we  consider  a  single- variable  model  of  the  form  f2(x)  =  c0  +  cyearsa;years, 


and  obtain  the  results  presented  in  Rows  11-19  of  the  same  Table  13.  The  adjusted 
coefficients  of  determination  R^Adj  slightly  increase  in  the  cases  of  least  squares  and 
superquantile  regressions  techniques,  where  a  G  {0.25,  0.50,  0.75}.  Figure  17(a)  shows 
the  corresponding  regression  lines  for  the  linear  model  /2,  at  a  fixed  probability  level 
a  =  0.50.  It  is  interesting  to  notice  that  the  quantile  regression  line  for  a  =  0.50  has 
a  negative  intercept,  while  the  least  squares  and  superquantile  regression  functions 
intercept  the  y-axis  at  higher  values.  Another  aspect  we  learn  from  Figure  17(a)  is 
the  importance  of  the  magnitude  of  errors  in  regression.  This  is  evident  when  we 
compare  both  quantile  and  superquantile  regression  lines.  Superquantile  regression 
responds  to  the  observations  that  have  larger  errors,  emphasizing  those  observations 
that  we  might  consider  outliers. 

Both  least  squares  and  quantile  regression  functions  cross  each  other  at  xyears  = 
7.91  years.  The  observation  ( xyeaTS,y )  =  (2000,4)  shifts  the  least  squares  regression 
line  upwards  for  smaller  values  of  xyears,  while  the  large  number  of  helicopter  pilots 
with  3  and  4  years  flying  for  the  U.S.  Navy  with  low  total  flight  hours  shifts  the 
quantile  regression  model  downwards. 

In  Figure  17(b),  we  see  the  least  squares  regression  model  and  the  superquan¬ 
tile  regression  functions  for  probability  levels  a  G  {0.25,0.50,0.75,0.99,0.999}.  We 
notice  that  the  superquantile  regression  models  for  a  =  0.99  and  a  =  0.999  have 
higher  slopes  when  compared  to  the  remainder  of  superquantile  regression  lines.  Even 
the  difference  in  slopes  established  by  the  small  increase  of  0.009  in  a  provides  us  the 
conclusion  that  deciding  which  probability  level  to  use  in  an  analysis  is  a  hard  process. 
Since  obtaining  these  superquantile  regression  models  is  not  too  costly,  we  consider 
important  to  include  several  choices  of  probability  levels  a  in  any  analysis. 

From  this  example  we  conclude  that  superquantile  regression  helps  analysts 
address  important  questions  such  as  level  and  trends  of  the  average  1%  highest  total 
flight  hours  (in  the  case  of  a  =  0.99,  in  Figure  17(b)),  understand  if  deployment  rules 
should  be  reviewed,  and  if  these  cases  should  be  analyzed  before  reassigning  them  for 
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(a)  Regression  lines  for  model  f2(%)  =  cq  +  Cyearsxyears. 


(b)  Least  squares  and  superquantile  regression  functions  for  model 
h  =  Co  +  cyears%ears,  for  a  €  {0.25, 0.50, 0.75, 0.99, 0.999}. 


Figure  17.  Example  E:  Superquantile  regression  applied  to  the  U.S.  Navy  helicopter 
pilots  data. 
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future  deployments. 


F.  PORTUGUESE  SUBMARINERS  EFFORT  INDEX 

The  next  example  is  based  on  a  data  set  provided  by  the  Portuguese  Navy 
Submarine  Squadron.  We  seek  to  estimate  the  random  variable  Y  that  represents 
the  effort  index  of  the  Portuguese  submariners.  This  index  was  created  as  a  decision 
tool  to  support  human  resource  management  inside  the  Submarine  Squadron.  Once  a 
sailor  becomes  a  submariner,  his  career  depends  mainly  on  the  Submarine  Squadron. 
The  Commanding  Officer  of  the  Submarine  Squadron  has  the  power  of  assigning  a 
submariner  for  a  mission,  if  there  is  the  need  to  embark  an  extra  element  or  substitute 
someone  onboard.  It  is  crucial  to  support  such  decisions  with  a  tool  that  emphasizes 
who  is  more  “available”  for  the  mission. 

The  idea  behind  this  index  is  to  build  in  the  near  future  a  prototype  for 
submariners  careers  which  helps  determine  selection  criteria  for  future  Submarine 
Squadron  personnel  recruitment  and  also  understand  who  has  been  overemployed. 

In  the  data  set,  we  have  103  observations  with  five  possible  explanatory  vari¬ 
ables:  years  since  a  submariner  has  gained  the  insignia  of  the  Portuguese  submarine 
service  (Xdoiphins),  years  a  submariner  has  embarked  on  surface  warships  (Xsurf),  years 
a  submariner  has  been  ashore  (Washore),  total  submarine  navigation  hours  (Xsub),  and 
submariners  age  (A"age). 

Naturally  one  thinks  that  age  is  an  important  factor  that  needs  to  be  taken 
into  consideration.  The  idea  of  older  submariners  having  more  experience  due  to 
more  training  has  not  always  been  true,  and  that  issue  raised  the  question  of  how  to 
quantify  training  and  expertise.  Figure  18(a)  shows  that  although  age  is  important, 
it  does  not  directly  translate  the  effort  of  a  submariner.  For  example,  a  39-year  old 
submariner  can  have  an  effort  index  as  low  as  5  or  as  high  as  22.  Such  discrepancies 
cause  discomfort  among  fellow  submariners. 

Another  factor  that  is  also  relevant  for  designing  a  prototype  for  submariners 
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Submariners  Age 

(a)  Effort  index  versus  submariners  ages. 


Years  with  Dolphins 


(b)  Effort  index  versus  years  submariners  have  the  dolphins. 

Figure  18.  Example  F:  Portuguese  submariners  effort  index  against  their  ages  and 
years  they  have  the  submariners  insignia. 
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careers  is  the  number  of  years  a  submariner  has  the  insignia  of  the  Portuguese  subma¬ 
rine  service.  Analogously  to  age,  one  thinks  that  the  larger  this  number,  the  higher 
the  effort  index.  Figure  18(b)  shows  how  the  effort  index  behaves  with  the  number  of 
years  a  submariner  has  the  dolphins,  and  we  realize  there  is  an  increasing  variability 
among  these  observations. 

For  now  we  consider  higher  effort  indices  to  be  more  detrimental  than  small 
indices  for  the  completion  of  the  Submarine  Squadron  mission,  i.e.,  overemploying  is 
considered  worse  than  underemploying  a  submariner. 

One  of  the  goals  with  this  example  is  to  show  that  superquantile  regression 
helps  us  better  visualize  what  may  cause  the  discrepancies  in  effort  indices  among 
submariners. 

We  next  observe  the  possible  correlations  between  variables  in  the  data  set.  In 
Figure  19,  we  have  the  scatterplot  matrix  of  the  data  set  for  some  of  the  explanatory 
random  variables,  Adoiphins,  Xsurf,  and  A"ashore,  against  the  effort  index  Y.  Here  we 
can  observe  a  linear  correlation  between  the  number  of  years  a  submariner  has  the 
dolphins  and  the  effort  index.  Since  the  total  submarine  navigation  hours  Amours  is 
a  factor  considered  in  the  computation  of  the  effort  index,  their  correlation  is  very 
high  and  we  do  not  include  this  variable  in  the  scatterplot  or  later  in  the  analysis. 
We  explore  several  possible  models  and  compare  the  obtained  solution  vectors  and 
coefficient  of  determination  results  for  further  analysis. 

In  Figure  20,  we  plot  the  submariners  ages  against  the  number  of  years  a 
submariner  has  the  insignia  of  the  Portuguese  submarine  service.  A  small  detail 
that  we  encounter  here  is  the  lack  of  observations  for  values  of  Adoiphins  between  4 
and  7  years.  This  lack  of  observations  is  due  to  fact  that  Portugal  acquired  the 
Tridente- class  submarines  in  2010,  and  the  few  years  prior  were  dedicated  to  training 
the  existing  submariners  to  a  completely  new  technology.  This  process  required  the 
Portuguese  Navy  to  delay  the  submariners  course  until  after  the  reception  of  the  new 
assets. 
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Figure  19.  Example  F:  Portuguese  submariners  effort  index  scatterplot  matrix. 
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Figure  20.  Example  F:  Submariners  ages  against  the  number  of  years  they  have  the 
submariners  insignia. 

We  first  start  with  a  linear  model  of  the  form  f\  (x)  =  c0  +  CjXj,  with  i  G 
{dolphins,  surf,  ashore,  age},  i.e.,  we  only  include  one  explanatory  variable  at  a  time. 
We  then  consider  a  linear  model  i^Cdoiphms^doiphms  +  cagea;a ge.  Table  14  presents  the 
obtained  solution  vectors  and  the  corresponding  coefficients  of  determination,  for  a 
probability  level  a  =  0.75.  The  years  the  submariners  embark  in  surface  warships 
and  the  number  of  years  they  spend  ashore  between  embarks  are  two  explanatory 
variables  that  we  discard  from  this  point  on,  because  they  both  play  a  very  negligible 
role,  as  determined  by  -Rq75,  even  though  they  might  be  important  in  conjunction 
with  other  explanatory  random  variables.  Rows  6,  11,  and  16  of  Table  14  report 
the  regression  vectors  for  model  /2-  We  realize  that  the  coefficient  of  determination 
improves  in  these  cases. 
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Regression 

c0 

^dolphins 

^surf 

^ashore 

^age 

p2 

-^0.75 

3.1365 

0.8643 

— 

— 

— 

0.7452 

11.9111 

— 

-0.4448 

— 

— 

0.0182 

Least  Squares 

8.3782 

— 

— 

0.7491 

— 

0.2314 

-17.6369 

— 

— 

— 

0.7983 

0.4845 

8.1218 

0.9918 

— 

— 

-0.1711 

0.7512 

2.8690 

1.0878 

— 

— 

— 

— 

15.8063 

— 

-0.6190 

— 

— 

— 

Quantile 

10.7798 

— 

— 

0.7945 

— 

— 

-19.2554 

— 

— 

— 

0.9084 

— 

11.7357 

1.3037 

— 

— 

-0.3065 

— 

2.9811 

1.2172 

— 

— 

— 

0.5866 

17.6450 

— 

0.3456 

— 

— 

0.0038 

Superquantile 

15.4621 

— 

— 

0.5666 

— 

0.0212 

-27.0234 

— 

— 

— 

1.2048 

0.2403 

7.3697 

1.3430 

— 

— 

-0.1558 

0.5939 

Table  14.  Example  F:  Regression  vectors  and  R2a  for  linear  models  f\  and  f-2,  at  a 
fixed  probability  level  a  =  0.75. 

In  Figure  21,  we  plot  the  linear  model  fi(x)  =  cq  +  Cdoiphins^doiphins  for  least 
squares,  0.60-quantile  and  0.60-superquantile  regressions.  All  three  obtained  regres¬ 
sion  functions  have  completely  distinct  slopes.  The  blue  line  representing  the  quantile 
regression  gives  us  the  notion  of  where  the  40%  worst  cases  are,  while  the  green  line 
representing  the  superquantile  regression  model  provides  us  the  average  of  these  worst 
indices. 

As  stated  at  the  beginning  of  this  example,  the  orientation  of  the  problem  is 
such  that  higher  effort  indices  are  worse.  However  and  as  illustration  we  believe  it  is 
very  beneficial  to  look  at  the  cases  where  the  submariners  effort  is  low  and  therefore  we 
flip  the  orientation  of  this  problem  for  the  next  figure  in  order  to  highlight  those  cases 
that  should  also  be  taken  into  consideration.  This  is  a  good  example  where  using  one 
of  both  orientations  in  solving  the  problem  is  possible  depending  on  where  the  major 
concern  lies.  Figure  22  shows  the  least  squares  regression  model  and  the  0.75-quantile 
and  0.75-superquantile  regression  functions.  We  add  the  0.25-quantile  regression  fit 
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Figure  21.  Example  F:  Regression  lines  for  model  fi(x)  =  Cq  +  Cdoiphins^doiphms- 

and  the  new  0.75-superquantile  regression  function,  after  flipping  the  orientation  of 
the  problem  and  solving  for  the  new  superquantile  regression  problem,  marked  with  an 
asterisk  in  the  legend,  and  displayed  in  Figure  22  by  a  dashed  green  line.  This  dashed 
line  has  the  same  meaning  as  the  full  green  line  but  for  the  lowest  25%  presented  effort 
indices.  We  consider  that  taking  care  of  both  ends  of  the  spectrum  will  expedite  the 
process  of  smoothing  the  submariners  career,  but  this  is  not  fully  pursued  here.  We 
finish  using  the  linear  model  f\(x)  =  c0  +  Cdoiphms^doiphins  by  showing  Figure  23, 
where  clearly  the  0.25-superquantile  regression  model  is  completely  different  of  the 
0.75-superquantile*  regression  model. 

Second,  we  consider  a  quadratic  model  of  the  form  /3(a;)  =  Co  +  cagexage  + 
Cage2^agc-  Table  15  shows  the  obtained  regression  vectors  and  the  corresponding  co- 
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Figure  22.  Example  F:  Least  squares,  quantile  and  superquantile  regression  func¬ 
tions  for  linear  model  /i.  An  asterisk  indicates  that  the  0.75-superquantile  regression 
function  was  obtained  after  reversing  the  orientation  of  the  original  problem. 


Regression 

Co 

^age 

Cage2 

c>2 

-^o.rs 

Least  Squares 

-87.1182 

4.6498 

-0.05251 

0.5442 

Quantile 

-97.2181 

5.2652 

-0.0600 

— 

Superquantile 

-126.4859 

6.9812 

-0.0827 

0.3235 

Table  15.  Example  F:  Regression  vectors  and  R2a  for  quadratic  model  fs(x)  =  Cq  + 
Cage^age  +  Cage2^age>  with  a  =  0.75. 
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Figure  23.  Example  F:  Different  ct-superquantile  regression  functions  for  linear  model 
fi.  An  asterisk  indicates  that  the  0.75-superquantile  regression  function  was  obtained 
after  reversing  the  orientation  of  the  original  problem. 

efficients  of  determination,  which  are  larger  than  those  obtained  using  the  linear 
model.  We  plot  these  quadratic  models  in  Figure  24(a),  and  notice  that  the  0.75- 
superquantilc  regression  function  captures  the  effects  of  the  higher  effort  indices  and 
forms  an  interesting  curvature.  To  the  contrary,  the  0.75-quantile  regression  model 
does  not  seem  to  be  affected  by  such  observations  and  it  looks  almost  parallel  to  the 
least  squares  regression  model  for  a  40-year  old  submariner.  With  these  comments 
in  mind,  we  need  a  different  validation  analysis  tool  that  helps  us  understand  which 
observations,  if  any  exist,  should  be  carefully  checked  for  their  validity,  or  should  pos¬ 
sibly  be  seen  as  outliers.  Before  we  finish  this  example  and  as  seen  in  Section  II. C,  we 
utilize  the  Cook’s  distance  concept  first  applied  to  the  case  of  least  squares  regression, 
then  to  the  case  of  superquantile  regression.  Since  there  is  more  than  one  possible 
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(a)  Effort  index  versus  submariners  ages. 


(b)  High  leverage  observations  for  quadratic  model  f 3 . 

Figure  24.  Example  F:  Quadratic  regression  models  fs  at  probability  level  a  =  0.75. 
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cut-off  value  for  such  Cook’s  distances,  we  resort  to  the  commonly  used  formula  4/z/ 
in  both  cases,  least  squares  and  0.75-superquantile  regression,  where  we  have  v  =  103 
observations  in  this  example.  Figure  25(a)  shows  the  Cook’s  distances  for  the  least 
squares  quadratic  model,  while  Figure  25(b)  the  Cook’s  distances  for  the  superquan¬ 
tile  regression  technique.  We  clearly  see  that  observations  number  24,  40,  49,  and  80, 
emphasized  by  the  red  dots  in  Figure  24(b)  and  25(a),  are  considered  high  leverage 
observations  for  least  squares  regression.  In  the  context  of  superquantile  regression, 
we  see  that  observations  number  1,  19,  24,  27,  28,  31,  and  32,  emphasized  by  the  green 
dots  in  Figure  24(b)  and  25(b),  are  considered  high  leverage  observations.  Curiously, 
in  our  example  only  observation  24  is  coincidently  considered  high  leverage  for  both 
regression  techniques;  plotted  in  orange  in  Figure  24(b).  Another  interesting  detail 
consists  on  where  the  high  leverage  observations  are  located  in  this  same  plot.  We 
realize  that  these  observations  drive  the  superquantile  regression  fit  downwards  since 
they  influence  the  0.75-quantilc  regression  function,  and  consequently  also  the  result¬ 
ing  0.75-superquantile  regression  function  as  an  average  of  all  observations  above  the 
quantile  regression  fit. 

Further  analysis  should  be  done  for  this  example  but  it  goes  beyond  the  scope 
of  this  dissertation.  However  we  conclude  that  superquantile  regression  is  an  impor¬ 
tant  analysis  tool  that  when  used  wisely  gives  the  decision  maker  powerful  information 
on  the  upper  tail  of  the  random  variable  of  concern.  With  the  Cook’s  distance  con¬ 
cept  applied  to  the  superquantile  regression,  we  also  identify  those  observations  in 
the  data  set  that  influence  the  obtained  fit  and  that  should  be  checked  for  validity. 

G.  UNCERTAINTY  QUANTIFICATION 

The  next  example  arises  in  uncertainty  quantification  of  a  rectangular  cross 
section  of  a  short  structural  column,  with  depth  d  and  width  w,  under  uncertain  yield 
stress  and  uncertain  loads;  see  Eldred  et  al.  (2011).  Assuming  an  clastic-perfectly 
plastic  material,  a  limit-state  function  that  quantifies  a  relationship  between  loads 
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(a)  Cook’s  distances  for  least  squares  regression. 
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(b)  Cook’s  distances  for  superquantile  regression. 

Figure  25.  Example  F:  Cook’s  distances  for  least  squares  and  superquantile  regression 
fits  using  quadratic  model  /3(x)  =  Co  +  cagexage  +  cage2£age,  at  a  =  0.75. 
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and  capacity  is  described  by  the  random  variable 


y  =  _l  +  J*  + 


X| 


(IV.l) 


wd?X3  w2d2X. f 

where  the  bending  moment  load  X\  and  the  axial  load  X2  are  normally  distributed 
with  mean  2,000  and  standard  deviation  400,  and  mean  500  and  standard  deviation 
100,  respectively,  and  the  material’s  yield  stress  X3,  is  lognormally  distributed  with 
parameters  5  and  0.5,  with  Xll  X2,  and  X3  independent.  We  observe  that  the  second 
term  in  (IV.l)  is  the  ratio  of  moment  load  to  the  column’s  moment  capacity,  and 
the  third  term  is  the  square  of  the  ratio  of  the  axial  load  to  the  axial  capacity.  The 
constant  —1  is  introduced  for  the  sake  of  a  translation  such  that  positive  realizations 
of  Y  represent  “failure”  and  negative  ones  correspond  to  a  situation  where  load  effects 
remain  within  the  capacity  of  the  column.  (We  note  that  the  orientation  of  the  limit- 
state  function  is  switched  compared  to  that  of  Eldred  et  al.  (2011)  for  consistency 
with  our  focus  on  “losses”  instead  of  “gains.”)  We  set  the  width  w  =  3,  and  the 
depth  d  =  12. 


Model 

a 

Co 

102ci 

104c2 

104c3 

Rl 

0.999 

-0.6797 

0.0156 

7.9000 

-9.1100 

0.154 

0.99 

-0.8084 

0.0150 

3.8000 

-8.2700 

0.190 

fi 

0.9 

-0.8579 

0.0107 

1.5900 

-7.7000 

0.260 

0.75 

-0.8705 

0.0090 

1.0800 

-7.5900 

0.301 

LS 

-0.8827 

0.0070 

0.5921 

-7.7180 

0.571 

0.999 

-1.0457 

1.8640 

0.0300 

— 

0.902 

0.99 

-1.0450 

1.6182 

0.0400 

— 

0.891 

h 

0.9 

-1.0308 

1.3393 

0.0200 

— 

0.894 

0.75 

-1.0261 

1.2595 

0.0200 

— 

0.893 

LS 

-1.0179 

1.1315 

0.0056 

— 

0.979 

Table  16.  Example  G:  Approximate  regression  vectors  and  coefficients  of  determina¬ 
tion  for  superquantile  regression  with  varying  a  and  least  squares  (LS)  regression. 


We  seek  to  quantify  the  “uncertainty”  in  Y  by  surrogate  estimation.  Of  course, 
in  this  case,  this  is  hardly  necessary;  direct  use  of  (IV.l)  suffices.  However,  in  practice, 
an  analytic  expression  for  a  limit-state  function,  as  in  (IV.l),  is  rarely  available.  One 
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then  proceeds  with  determining  a  regression  function  /  :  IR3  — »  M,  based  on  a  sample 
of  input-output  realizations,  such  that  f(X),  with  X  =  (Xi,X2,  X3),  approximates 
Y  in  some  sense.  To  mimic  this  situation,  we  consider  a  sample  of  size  50000  drawn 
independently  from  A",  the  corresponding  realizations  of  Y  according  to  (IV.  1),  and 
two  forms  of  the  regression  function.  The  first  model  is  linear  and  takes  the  form 

fi(x)  =  c0  +  ci^i  +  c2x2  +  032:3 

and  the  second  one  utilizes  basis  functions  h\(x)  =  x\/x 3  and  h2(x )  =  ( x2/x3 )2  and 
is  of  the  form 

f2(x)  =  c0  +  CiXi/x3  +  c2x\jx\. 

In  view  of  (IV.  1),  we  expect  /1  to  be  unable  to  capture  interaction  effects  between 
variables  and  its  explanatory  power  may  be  limited.  In  contrast,  f2  uses  the  correct 
basis  functions,  but  even  then  f2(X)  may  deviate  from  Y  due  to  the  finite  sample  size 
used  to  determine  the  regression  vector.  Table  16  confirms  this  intuition  by  showing 
approximate  regression  vectors  for  both  models  over  a  range  of  probability  levels  a 
as  well  as  for  the  least  squares  (LS)  regression.  The  vectors  are  obtained  in  less  than 
15  seconds  by  solving  P^m,  with  u  =  50000,  11  =  1000,  and  Simpson’s  rule.  The  last 
column  of  Table  16  shows  R2a)  which  is  low  for  f\  and  high  for  f2  as  expected. 

In  uncertainty  quantification  and  elsewhere,  surrogate  estimates  such  as  fi(X) 
and  fr2  ( X )  are  important  inputs  to  further  analysis  and  simulation.  Table  17  illus¬ 
trates  the  quality  of  these  surrogate  estimates  in  this  regard  by  showing  various 
statistics  of  fi(X)  and  f2(X)  as  compared  to  those  of  Y.  Row  2,  Columns  3-10  show 
estimated  mean,  standard  deviation,  superquantiles  at  0.75,  0.9,  0.99,  0.999,  probabil¬ 
ity  of  failure,  and  buffered  probability  of  failure  (see  (II. 5))  of  Y,  respectively,  using  a 
sample  size  of  10 7  and  standard  estimators.  Coefficients  of  variation  for  these  estima¬ 
tors  are  ranging,  approximately,  from  10~5  for  the  mean  to  0.02  for  the  probability  of 
failure.  Rows  3-6  of  Table  17  show  similar  results,  using  the  same  sample,  for  f\  (V), 
with  a  =  0.999,  0.99,  0.9,  and  0.75,  respectively.  We  notice  that  as  the  probability 
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level  a  increases,  fi(X)  becomes  increasingly  conservative.  In  fact,  for  a  =  0.999, 
fi(X)  is  conservative  in  all  statistics.  Superquantile  regression  with  smaller  proba¬ 
bility  level  a  fails  to  be  conservative  for  some  “upper-tail”  statistics.  Interestingly, 
f\  (X)  based  on  a  is  conservative  for  all  superquantiles  up  to  and  including  qa  in  these 
tests.  These  observations  indicate  that  in  surrogate  estimation  the  probability  level  a 
should  be  selected  in  accordance  with  the  superquantile  statistic  of  interest.  We  can 
then  expect  to  obtain  conservative  estimates  even  for  relatively  poor  surrogates.  Row 
7  of  Table  17  gives  corresponding  results  for  fi(X)  under  the  least  squares  regression 
fit.  While  this  fit  provides  an  accurate  estimate  of  the  mean  (see  Column  3),  the 
upper-tail  behavior  is  represented  in  a  nonconservative  manner. 

Rows  8-12  of  Table  17  show  comparable  results  to  those  above,  but  for  the 
/2( A")  models.  As  also  indicated  in  Table  16,  /2(A )  is  a  much  better  surrogate 
of  Y  than  fi(X)  and  essentially  all  quantities  improve  in  accuracy.  For  example, 
/2( A”)  based  on  superquantile  regression  overestimates  the  buffered  failure  probabil¬ 
ity  only  moderately  with  a  =  0.999,  0.99,  and  0.9,  and  slightly  underestimate  with 
a  =  0.75;  see  the  last  column  of  Table  17.  In  contrast,  least  squares  regression  un¬ 
derestimates  the  buffered  failure  probability  substantially  even  for  this  supposedly 
“accurate”  model.  Of  course,  least  squares  regression  centers  on  conditional  expecta¬ 
tions  and  as  basis  for  estimating  tail  behavior  may  hide  potentially  dangerous  risks. 
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H.  SUPERQUANTILE  TRACKING 

To  finish  Chapter  IV,  we  return  to  the  first  example  in  Section  IV. A,  and 
consider  a  loss  random  variable 


V  =  X1  +  X2e, 

where  e  is  a  standard  normal  random  variable  and  X  =  (Xi,X2)  is  uniformly  dis¬ 
tributed  on  [—1, 1]  x  [0, 1],  with  e,  Xi,  and  X2  independent.  We  consider  a  regression 
function  of  the  form  /(x)  =  Co  +  C\X\  +  c2x2  and  set  a  =  0.90. 

We  examine  conditional  values  of  Y  given  realizations  of  X  =  (Xi,X2),  i.e., 
superquantile  tracking.  For  x  =  (xi,x2),  Y(x)  =  Y\X  =  x  is  normally  distributed 
with  mean  x\  and  variance  x2.  Consequently,  it  is  straightforward  to  compute  that 
q0,g(Y(x))  =  X\  +  1. 7550x2-  Table  2  shows  vectors  that  only  track  q09(Y(-))  approx¬ 
imately,  as  c0,  Ci,  and  c2  deviate  from  0,  1,  and  1.755,  respectively.  In  fact,  there  is 
in  general  no  guarantee  that  every  regression  function  /  will  satisfy  /(x)  =  qa(Y(x)) 
for  all  x,  even  for  large  sample  sizes.  As  indicated  by  Proposition  II. 5,  however,  a 
superquantile  of  Y(x)  can  be  estimated  by  approximating  a  degenerate  distribution 
of  (X,  Y)  at  x. 


X  range: 

1 

_  h— 1 

X 

o’ 

[0.45,  0.55]2 

[0.495, 0.505]2 

Cq  H-  0.5ci  “h  0.5c2 

(1.349,  1.575) 

(1.329,  1.475) 

(1.330,  1.473) 

Co 

(0.029,  0.123) 

(-2.414,  1.784) 

(-23.715,  18.329) 

Cl 

(0.971,  1.075) 

(-0.229,  3.597) 

(-11.063,  25.656) 

C2 

(1.523,  1.975) 

(-1.686,  5.186) 

(-33.916,  35.701) 

Table  18.  Example  H:  Approximate  95%  confidence  intervals  when  tracking  qo_g(Y(-)) 
near  x  =  (0.5, 0.5)  using  shrinking  sampling  ranges  for  X.  The  correct  value 
<?o.9(T((0.5,  0.5)))  =  1.378. 

Table  18  shows  such  “local”  estimates  of  q09(Y(x))  near  x  =  (0.5,  0.5).  Specifi¬ 
cally,  using  v  =  500  we  compute  Co,  ci,  and  c2  by  solving  p  as  above,  with  X  sampled 
uniformly  from  [—1, 1]  x  [0, 1].  We  repeat  these  calculations  10  times  with  independent 
samples  and  obtain  the  aggregated  statistics  of  Column  2  of  Table  18.  The  second  row 
gives  an  approximate  95%  confidence  interval  for  the  mean  value  of  cq  +  0.5ci  +  0.5c2 
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across  the  10  meta-replications.  The  interval  contains  g0.g(Y((0.5,  0.5)))  =  1.3775, 
but  is  somewhat  wide.  Proposition  II. 5  indicates  that  sampling  from  a  smaller  set 
[0.45,0.55]  x  [0.45,0.55]  will  tend  to  improve  the  estimate  of  go.9(Y((0.5,  0.5))).  Col¬ 
umn  3  of  Table  18  illustrates  this  effect,  by  showing  results  comparable  to  those  of 
Column  2  and  Row  2,  but  for  the  smaller  interval.  As  expected,  the  confidence  in¬ 
terval  for  Co  +  0.5ci  +  0.5c2  narrows  around  the  correct  value.  The  last  column  shows 
similar  results,  but  now  for  sampling  of  X  uniformly  on  [0.495,0.505]  x  [0.495,0.505]. 
The  estimate  of  go.9(X((0-5,  0.5)))  improves  only  marginally,  with  the  residual  uncer¬ 
tainty  being  due  to  the  inherent  variability  in  the  (relatively  small)  samples.  The 
narrow  sampling  interval  causes  the  last  estimate  to  be  similar  to  that  obtained  by 
the  standard  empirical  estimate  from  500  realizations  of  Y((0.5,  0.5)),  which  yields 
the  confidence  interval  (1.312, 1.462). 

While  sampling  on  smaller  sets  gives  better  local  estimates  of  g0.g(Y(x)),  the 
global  picture  deteriorates.  The  last  three  rows  of  Table  18  show  corresponding 
approximate  95%  confidence  intervals  for  c0,  c i,  and  c2,  respectively.  While  c0  +  cia;i  + 
C2X 2  generated  by  the  set  [—1, 1]  x  [0, 1]  provides  a  reasonably  good  global  picture 
of  q0mg(Y(x)),  the  smaller  sets  lose  that  quality  as  seen  from  the  wide  confidence 
intervals.  In  view  of  the  above  results,  we  see  that  an  analyst  that  can  choose  “design 
points,”  i.e. ,  points  x  at  which  to  sample  Y(x),  should  balance  the  need  for  accurate 
local  estimates  with  that  of  global  estimates.  In  fact,  even  if  the  primary  focus  is 
on  estimating  qa(Y(x))  for  a  given  x,  as  we  see  in  this  example,  it  may  be  equally 
effective  to  spread  the  samples  of  X  near  x  instead  of  exactly  at  x,  and  then  obtain 
some  global  information  about  qa(Y(-))  too.  Our  methodology  provides  a  flexible 
framework  for  estimating  qa(Y(x))  even  if  there  is  only  a  small  number  of  realization 
of  Y(x),  or  even  none,  available.  The  estimates  are  based  on  realizations  of  Y(x')  for 
x'  near  x. 

In  the  next  chapter  we  discuss  the  conclusions  taken  from  our  research  and 
suggest  possible  future  work. 
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V.  SUMMARY,  CONCLUSIONS,  AND 

FUTURE  WORK 

A.  SUMMARY  AND  CONCLUSIONS 

In  this  dissertation,  we  develop  a  novel  regression  framework,  superquantile 
regression,  that  naturally  extends  least  squares  and  quantile  regressions  to  contexts 
where  the  decision  maker  is  risk  averse  and  is  simultaneously  concerned  about  the 
magnitude  of  the  obtained  regression  errors.  As  opposed  to  squaring  these  errors  or 
by  looking  at  their  signs,  this  framework  for  superquantile  regression  weights  larger 
errors  increasingly  heavily  in  a  way  consistent  with  a  coherent  and  averse  risk  mea¬ 
sure,  the  superquantile  risk  measure.  We  use  superquantiles  directly  in  the  regression 
model  and  go  beyond  other  generalized  regression  techniques  that  approximate  condi¬ 
tional  superquantiles  by  various  combinations  of  conditional  quantiles,  with  the  only 
required  assumption  that  the  involved  random  variables  have  finite  second  moment. 

We  utilize  the  “Fundamental  Risk  Quadrangle”  concept  and  the  connections 
established  therein  between  distinct  measures  of  a  random  variable  whose  orientation 
is  such  that  upper-tail  realizations  are  unfortunate  and  low  realizations  are  favorable. 
We  rely  on  the  superquantile-based  risk  quadrangle  and  the  corresponding  relations 
between  measures  of  deviation,  risk,  and  error  applied  to  the  superquantile  as  the 
statistic  to  obtain  superquantile  regression  functions  as  optimal  solutions  of  an  error 
minimization  problem. 

Then  we  develop  the  fundamental  theory  for  superquantile  regression  by  defin¬ 
ing  its  regression  problem  as  an  error  minimization  problem.  We  examine  existence 
and  uniqueness  of  the  obtained  regression  functions,  and  we  establish  a  guaranteed 
unique  regression  vector  in  the  cases  where  the  loss  random  variable  and  the  chosen 
basis  functions  are  normally  distributed  with  a  positive  definite  variance-covariance 
matrix.  Next  we  analyze  consistency  and  stability  of  the  regression  functions  under 
perturbations  due  to  possible  measurement  errors  and  approximating  empirical  distri- 
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butions  generated  by  samples  of  the  underlying  data.  We  formulate  a  deviation-based 
superquantile  regression  problem  as  an  equivalent  minimization  problem  of  a  corre¬ 
sponding  measure  of  deviation  taken  from  the  superquantile-based  risk  quadrangle. 
This  new  minimization  problem  implies  computational  advantages  since  it  reduces  the 
number  of  variables  and  includes  a  simpler  objective  function.  We  also  provide  rate 
of  convergence  results  under  mild  assumptions  that  allow  us  to  use  an  approximate 
superquantile  regression  problem,  based  on  a  sample  of  the  true  distribution. 

Since  any  regression  framework  must  be  associated  with  means  of  assessing 
the  goodness  of  fit  of  a  computed  regression  vector,  we  define  three  validation  analy¬ 
sis  tools  for  quantile  and  superquantile  regressions:  the  coefficient  of  determination, 
the  adjusted  coefficient  of  determination,  and  Cook’s  distance.  We  first  analyze  the 
formulas  for  these  three  validation  analysis  tools  when  applied  to  least  squares  re¬ 
gression,  and  translate  them  into  measures  of  error  and  deviation  in  the  sense  of  the 
mean-based  quadrangle.  We  conclude  that  these  three  definitions  can  be  formulated 
for  any  generalized  regression  consisting  of  minimizing  an  error  random  variable. 

Concerning  computational  methods  for  solving  superquantile  regression  prob¬ 
lems,  we  develop  two  distinct  classes:  the  primal  methods  where  one  solves  the  su¬ 
perquantile  regression  problem  by  means  of  analytical  and  numerical  integration  tech¬ 
niques,  and  the  dual  methods  where  one  utilizes  the  dualization  of  risk  as  part  of  the 
objective  function  of  the  new  regression  problem  that  we  apply  to  discrete  cases. 

In  terms  of  complexity,  our  results  indicate  that  the  dual  methods  outperform 
the  primal  methods  in  most  of  the  cases,  especially  for  large  sample  sizes.  We  com¬ 
pare  computational  methods  by  presenting  their  runtimes  and  realize  that  using  dual 
methods  is  a  quite  fast  process  and  in  fact,  for  reasonable  sample  sizes,  is  not  much 
slower  than  least  squares  regression.  While  the  primal  method  with  analytical  inte¬ 
gration  retrieves  the  exact  solutions,  it  takes  too  long  to  run  and  requires  too  much 
memory  for  sample  sizes  larger  than  1000  observations. 

Our  results  show  that  superquantile  regression  is  computationally  tractable, 
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offers  new  insight  about  the  upper-tail-behavior  for  quantities  of  interest,  and  provides 
a  complementary  tool  for  risk-averse  decision  makers. 

B.  FUTURE  WORK 

Similarly  to  what  is  done  for  quantile  regression,  future  work  could  extend 
statistical  inference  and  predictive  analysis  applied  to  superquantile  regression.  Also 
one  could  further  research  model  validation  analysis  tools,  and  address  significance 
testing  for  superquantile  regression.  Much  research  also  remains  to  be  done  on  su¬ 
perquantile  tracking.  Furthermore,  one  could  build  on  an  .R-package  to  implement 
superquantile  regression  and  the  respective  supporting  documentation. 
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